Job Title: Voice Processing Specialist
Location: Remote /JaipurJob Type: Full-time / ContractExperience: 3+ years expertise in voice cloning, transformation, and synthesis technologies
Job Summary
We are seeking a talented and motivated
Voice Processing Specialist
to join our team and lead the development of innovative voice technologies. The ideal candidate will have a deep understanding of speech synthesis, voice cloning, and transformation techniques. You will play a critical role in designing, implementing, and deploying state-of-the-art voice models that enhance naturalness, personalization, and flexibility of speech in AI-powered applications. This role is perfect for someone passionate about advancing human-computer voice interaction and creating lifelike, adaptive voice systems.
Key Responsibilities
- Design, develop, and optimize advanced deep learning models for voice cloning, text-to-speech (TTS), voice conversion, and real-time voice transformation.
- Implement speaker embedding and voice identity preservation techniques to support accurate and high-fidelity voice replication.
- Work with large-scale and diverse audio datasets, including preprocessing, segmentation, normalization, and data augmentation to improve model generalization and robustness.
- Collaborate closely with data scientists, ML engineers, and product teams to integrate developed voice models into production pipelines.
- Fine-tune neural vocoders and synthesis architectures for better voice naturalness and emotional range.
- Stay current with the latest advancements in speech processing, AI voice synthesis, and deep generative models through academic literature and open-source projects.
- Contribute to the development of tools and APIs for deploying models on cloud and edge environments with high efficiency and low latency.
Required Skills
- Strong understanding of speech signal processing, speech synthesis, and automatic speech recognition (ASR) systems.
- Hands-on experience with voice cloning frameworks such as Descript Overdub, Coqui TTS, SV2TTS, Tacotron, FastSpeech, or similar.
- Proficiency in Python and deep learning frameworks like PyTorch or TensorFlow.
- Experience working with speech libraries and toolkits such as ESPnet, Kaldi, Librosa, or SpeechBrain.
- In-depth knowledge of mel spectrograms, vocoder architectures (e.g., WaveNet, HiFi-GAN, WaveGlow), and their role in speech synthesis.
- Familiarity with REST APIs, model deployment, and cloud-based inference systems using platforms like AWS, Azure, or GCP.
- Ability to optimize models for performance in real-time or low-latency environments.
Preferred Qualifications
- Experience in real-time voice transformation, including pitch shifting, timing modification, or emotion modulation.
- Exposure to emotion-aware speech synthesis, multilingual voice models, or prosody modeling.
- Design, develop, and optimize advanced deep learning models for voice cloning, text-to-speech (TTS), voice conversion, and real-time voice transformation
- Background in audio DSP (Digital Signal Processing) and speech analysis techniques.
- Previous contributions to open-source speech AI projects or publications in relevant domains.
Why Join Us
You will be part of a fast-moving, collaborative team working at the forefront of voice AI innovation. This role offers the opportunity to make a significant impact on products that reach millions of users, helping to shape the future of interactive voice experiences.Skills: automatic speech recognition (asr),vocoder architectures,voice cloning,voice processing,data,real-time voice transformation,speech synthesis,pytorch,tensorflow,voice conversion,speech signal processing,audio dsp,rest apis,python,cloud deployment,transformation,mel spectrograms,deep learning