Jobs
Interviews

8 Prosody Jobs

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

0 years

0 Lacs

Hyderabad, Telangana, India

On-site

About the Role: We are seeking a highly experienced Voice AI /ML Engineer to lead the design and deployment of real-time voice intelligence systems. This role focuses on ASR, TTS, speaker diarization, wake word detection, and building production-grade modular audio processing pipelines to power next-generation contact centre solutions, intelligent voice agents, and telecom-grade audio systems. You will work at the intersection of deep learning, streaming infrastructure, and speech/NLP technology, creating scalable, low-latency systems across diverse audio formats and real-world applications. Key Responsibilities: Voice & Audio Intelligence:  Build, fine-tune, and deploy ASR models (e.g., Whisper, wav2vec2.0, Conformer) for real-time transcription.  Develop and finetune high-quality TTS systems using VITS, Tacotron, FastSpeech for lifelike voice generation and cloning.  Implement speaker diarization for segmenting and identifying speakers in multi-party conversations using embeddings (x-vectors/d-vectors) and clustering (AHC, VBx, spectral clustering).  Design robust wake word detection models with ultra-low latency and high accuracy in noisy conditions. Real-Time Audio Streaming & Voice Agent Infrastructure:  Architect bi-directional real-time audio streaming pipelines using WebSocket, gRPC, Twilio Media Streams, or WebRTC.  Integrate voice AI models into live voice agent solutions, IVR automation, and AI contact center platforms.  Optimize for latency, concurrency, and continuous audio streaming with context buffering and voice activity detection (VAD).  Build scalable microservices to process, decode, encode, and stream audio across common codecs (e.g., PCM, Opus, μ-law, AAC, MP3) and containers (e.g., WAV, MP4). Deep Learning & NLP Architecture:  Utilize transformers, encoder-decoder models, GANs, VAEs, and diffusion models, for speech and language tasks.  Implement end-to-end pipelines including text normalization, G2P mapping, NLP intent extraction, and emotion/prosody control.  Fine-tune pre-trained language models for integration with voice-based user interfaces. Modular System Development:  Build reusable, plug-and-play modules for ASR, TTS, diarization, codecs, streaming inference, and data augmentation.  Design APIs and interfaces for orchestrating voice tasks across multi-stage pipelines with format conversions and buffering.  Develop performance benchmarks and optimize for CPU/GPU, memory footprint, and real-time constraints. Engineering & Deployment:  Writing robust, modular, and efficient Python code  Experience with Docker, Kubernetes, cloud deployment (AWS, Azure, GCP)  Optimize models for real-time inference using ONNX, TorchScript, and CUDA, including quantization, context-aware inference, model caching.  On device voice model deployment. Why join us?  Impactful Work: Play a pivotal role in safeguarding Tanla's assets, data, and reputation in the industry.  Tremendous Growth Opportunities: Be part of a rapidly growing company in the telecom and CPaaS space, with opportunities for professional development.  Innovative Environment: Work alongside a world-class team in a challenging and fun environment, where innovation is celebrated. Tanla is an equal opportunity employer. We champion diversity and are committed to creating an inclusive environment for all employees. www.tanla.com

Posted 3 weeks ago

Apply

0 years

0 Lacs

Hyderabad, Telangana, India

On-site

We are seeking a highly experienced Voice AI /ML Engineer to lead the design and deployment of real-time voice intelligence systems . This role focuses on ASR , TTS , speaker diarization , wake word detection , and building production-grade modular audio processing pipelines to power next-generation contact centre solutions , intelligent voice agents , and telecom-grade audio systems . You will work at the intersection of deep learning , streaming infrastructure , and speech/NLP technology , creating scalable, low-latency systems across diverse audio formats and real-world applications. Key Responsibilities: Voice & Audio Intelligence: Build, fine-tune, and deploy ASR models (e.g., Whisper , wav2vec2.0 , Conformer ) for real-time transcription. Develop and finetune high-quality TTS systems using VITS , Tacotron , FastSpeech for lifelike voice generation and cloning. Implement speaker diarization for segmenting and identifying speakers in multi-party conversations using embeddings (x-vectors/d-vectors) and clustering (AHC, VBx, spectral clustering). Design robust wake word detection models with ultra-low latency and high accuracy in noisy conditions. Real-Time Audio Streaming & Voice Agent Infrastructure: Architect bi-directional real-time audio streaming pipelines using WebSocket , gRPC , Twilio Media Streams , or WebRTC . Integrate voice AI models into live voice agent solutions , IVR automation , and AI contact center platforms . Optimize for latency , concurrency , and continuous audio streaming with context buffering and voice activity detection (VAD). Build scalable microservices to process, decode, encode, and stream audio across common codecs (e.g., PCM , Opus , μ-law , AAC , MP3 ) and containers (e.g., WAV , MP4 ). Deep Learning & NLP Architecture: Utilize transformers , encoder-decoder models , GANs , VAEs , and diffusion models , for speech and language tasks. Implement end-to-end pipelines including text normalization, G2P mapping, NLP intent extraction, and emotion/prosody control. Fine-tune pre-trained language models for integration with voice-based user interfaces. Modular System Development: Build reusable, plug-and-play modules for ASR , TTS , diarization , codecs , streaming inference , and data augmentation . Design APIs and interfaces for orchestrating voice tasks across multi-stage pipelines with format conversions and buffering. Develop performance benchmarks and optimize for CPU/GPU, memory footprint, and real-time constraints. Engineering & Deployment: Writing robust, modular, and efficient Python code Experience with Docker , Kubernetes , cloud deployment (AWS, Azure, GCP) Optimize models for real-time inference using ONNX , TorchScript , and CUDA , including quantization , context-aware inference , model caching .  On device voice model deployment. Why join us? Impactful Work: Play a pivotal role in safeguarding Tanla's assets, data, and reputation in the industry. Tremendous Growth Opportunities: Be part of a rapidly growing company in the telecom and CPaaS space, with opportunities for professional development. Innovative Environment: Work alongside a world-class team in a challenging and fun environment, where innovation is celebrated. Tanla is an equal opportunity employer. We champion diversity and are committed to creating an inclusive environment for all employees. www.tanla.com

Posted 3 weeks ago

Apply

0 years

0 Lacs

North Guwahati, Assam, India

On-site

Company Description The Indian Institute of Technology, Guwahati (IIT-Guwahati) is a premier research and engineering institute in India, established in 1994. Located in North Guwahati, it is the sixth member of the IIT fraternity. The institute offers a wide range of programs across eleven departments, including B. Tech., B. Des., M. Tech., Ph.D., and M.Sc. programs and emphasizes cutting-edge research and academic excellence. Role Description This is a full-time on-site role for a Post-Doctoral Fellow in Prosody, Phonetics, and Phonology at the Indian Institute of Technology, Guwahati. The Post-Doctoral Fellow will be responsible for conducting advanced research in the fields of prosody, phonetics, and phonology, utilizing laboratory skills for experimental phonetics, and analyzing data. The role also includes teaching responsibilities, aiding in the design and delivery of relevant coursework, and collaborating with other faculty and researchers. Qualifications Proficiency in linguistic experimentation using the state-of-the-art laboratory methods and experience in experimental phonetics Strong research background in prosody, phonetics, and phonology Experience with data analysis in linguistic research Teaching skills and experience in designing and delivering coursework in related fields Excellent written and verbal communication skills Ability to work collaboratively and independently Ph.D. in Linguistics required

Posted 1 month ago

Apply

0 years

0 Lacs

Hyderabad, Telangana, India

On-site

Company Description Echoleads.ai leverages AI-powered sales agents to engage, qualify, and convert leads through real-time voice conversations. Our voice bots act as scalable sales representatives, making thousands of smart, human-like calls daily to follow up instantly, ask the right questions, and book appointments effortlessly. Echoleads integrates seamlessly with lead sources like Meta Ads, Google Ads, and CRMs, ensuring leads are never missed. Serving modern sales and marketing teams across various industries, our AI agents proficiently handle outreach, lead qualification, and appointment setting. About the Role: We are seeking a highly experienced Voice AI /ML Engineer to lead the design and deployment of real-time voice intelligence systems. This role focuses on ASR, TTS, speaker diarization, wake word detection, and building production-grade modular audio processing pipelines to power next-generation contact center solutions, intelligent voice agents, and telecom-grade audio systems. You will work at the intersection of deep learning, streaming infrastructure, and speech/NLP technology, creating scalable, low-latency systems across diverse audio formats and real-world applications. Key Responsibilities: Voice & Audio Intelligence: Build, fine-tune, and deploy ASR models (e.g., Whisper, wav2vec2.0, Conformer) for real-time transcription. Develop and finetune high-quality TTS systems using VITS, Tacotron, FastSpeech for lifelike voice generation and cloning. Implement speaker diarization for segmenting and identifying speakers in multi-party conversations using embeddings (x-vectors/d-vectors) and clustering (AHC, VBx, spectral clustering). Design robust wake word detection models with ultra-low latency and high accuracy in noisy conditions. Real-Time Audio Streaming & Voice Agent Infrastructure: Architect bi-directional real-time audio streaming pipelines using WebSocket, gRPC, Twilio Media Streams, or WebRTC. Integrate voice AI models into live voice agent solutions, IVR automation, and AI contact center platforms. Optimize for latency, concurrency, and continuous audio streaming with context buffering and voice activity detection (VAD). Build scalable microservices to process, decode, encode, and stream audio across common codecs (e.g., PCM, Opus, μ-law, AAC, MP3) and containers (e.g., WAV, MP4). Deep Learning & NLP Architecture: Utilize transformers, encoder-decoder models, GANs, VAEs, and diffusion models, for speech and language tasks. Implement end-to-end pipelines including text normalization, G2P mapping, NLP intent extraction, and emotion/prosody control. Fine-tune pre-trained language models for integration with voice-based user interfaces. Modular System Development: Build reusable, plug-and-play modules for ASR, TTS, diarization, codecs, streaming inference, and data augmentation. Design APIs and interfaces for orchestrating voice tasks across multi-stage pipelines with format conversions and buffering. Develop performance benchmarks and optimize for CPU/GPU, memory footprint, and real-time constraints. Engineering & Deployment: Writing robust, modular, and efficient Python code Experience with Docker, Kubernetes, cloud deployment (AWS, Azure, GCP) Optimize models for real-time inference using ONNX, TorchScript, and CUDA, including quantization, context-aware inference, model caching. On device voice model deployment.

Posted 1 month ago

Apply

0 years

0 Lacs

Hyderabad, Telangana, India

On-site

About the Role: We are seeking a highly experienced Voice AI /ML Engineer to lead the design and deployment of real-time voice intelligence systems . This role focuses on ASR , TTS , speaker diarization , wake word detection , and building production-grade modular audio processing pipelines to power next-generation contact center solutions , intelligent voice agents , and telecom-grade audio systems . You will work at the intersection of deep learning , streaming infrastructure , and speech/NLP technology , creating scalable, low-latency systems across diverse audio formats and real-world applications. Key Responsibilities: Voice & Audio Intelligence: Build, fine-tune, and deploy ASR models (e.g., Whisper , wav2vec2.0 , Conformer ) for real-time transcription. Develop and finetune high-quality TTS systems using VITS , Tacotron , FastSpeech for lifelike voice generation and cloning. Implement speaker diarization for segmenting and identifying speakers in multi-party conversations using embeddings (x-vectors/d-vectors) and clustering (AHC, VBx, spectral clustering). Design robust wake word detection models with ultra-low latency and high accuracy in noisy conditions. Real-Time Audio Streaming & Voice Agent Infrastructure: Architect bi-directional real-time audio streaming pipelines using WebSocket , gRPC , Twilio Media Streams , or WebRTC . Integrate voice AI models into live voice agent solutions , IVR automation , and AI contact center platforms . Optimize for latency , concurrency , and continuous audio streaming with context buffering and voice activity detection (VAD). Build scalable microservices to process, decode, encode, and stream audio across common codecs (e.g., PCM , Opus , μ-law , AAC , MP3 ) and containers (e.g., WAV , MP4 ). Deep Learning & NLP Architecture: Utilize transformers , encoder-decoder models , GANs , VAEs , and diffusion models , for speech and language tasks. Implement end-to-end pipelines including text normalization, G2P mapping, NLP intent extraction, and emotion/prosody control. Fine-tune pre-trained language models for integration with voice-based user interfaces. Modular System Development: Build reusable, plug-and-play modules for ASR , TTS , diarization , codecs , streaming inference , and data augmentation . Design APIs and interfaces for orchestrating voice tasks across multi-stage pipelines with format conversions and buffering. Develop performance benchmarks and optimize for CPU/GPU, memory footprint, and real-time constraints. Engineering & Deployment: Writing robust, modular, and efficient Python code Experience with Docker , Kubernetes , cloud deployment (AWS, Azure, GCP) Optimize models for real-time inference using ONNX , TorchScript , and CUDA , including quantization , context-aware inference , model caching . On device voice model deployment. Why join us? Impactful Work: Play a pivotal role in safeguarding Tanla's assets, data, and reputation in the industry. Tremendous Growth Opportunities: Be part of a rapidly growing company in the telecom and CPaaS space, with opportunities for professional development. Innovative Environment: Work alongside a world-class team in a challenging and fun environment, where innovation is celebrated. Tanla is an equal opportunity employer. We champion diversity and are committed to creating an inclusive environment for all employees.

Posted 1 month ago

Apply

3.0 years

0 Lacs

India

Remote

Role: AI Developer – Voice AI + Lifelike Avatars (India | Full-Time Remote) At Data-Hat AI, we're building the future of AI-powered interaction — where lifelike avatars speak, think, and respond like humans. We're looking for an AI Developer based in India who thrives at the intersection of Speech-to-Text, Text-to-Speech, and LLM-driven conversational AI. What You'll Work On: Build end-to-end STT + TTS pipelines using tools like Whisper, ElevenLabs, Unreal Engine, Polly, Azure TTS Integrate speech with lifelike avatars and holographic visuals Fine-tune LLMs (OpenAI, Cohere, Anthropic, etc.) to power dynamic conversations Deploy scalable, real-time systems on Azure or AWS Push the boundaries of what’s possible with expressive AI speech + 3D interaction What You Bring: 3+ years of Python development in AI/ML environments Hands-on with voice AI (Whisper, Polly, Azure Speech, etc.) Experience with Text-to-Speech (emotion, SSML, prosody control, unreal etc.) LLM experience (prompting, fine-tuning, API integration) Exposure to 3D avatar engines (Unity, Unreal, Ready Player Me, etc.) is a plus Solid cloud experience on Azure or AWS Location: Full-time Remote (India only) Why Join Us? Work on the cutting edge of voice, vision, and AI Help shape AI avatars and digital humans of the future Join a high-velocity, globally experienced team backed by top industry leaders Ready to build the future? Email your CV and any relevant project/demo links to hiring@data-hat.ai Let’s bring AI to life — literally.

Posted 1 month ago

Apply

3.0 years

0 Lacs

India

Remote

Job Title: Voice Processing Specialist Location: Remote /Jaipur Job Type: Full-time / Contract Experience: 3+ years expertise in voice cloning, transformation, and synthesis technologies Job Summary We are seeking a talented and motivated Voice Processing Specialist to join our team and lead the development of innovative voice technologies. The ideal candidate will have a deep understanding of speech synthesis, voice cloning, and transformation techniques. You will play a critical role in designing, implementing, and deploying state-of-the-art voice models that enhance naturalness, personalization, and flexibility of speech in AI-powered applications. This role is perfect for someone passionate about advancing human-computer voice interaction and creating lifelike, adaptive voice systems. Key Responsibilities Design, develop, and optimize advanced deep learning models for voice cloning, text-to-speech (TTS), voice conversion, and real-time voice transformation. Implement speaker embedding and voice identity preservation techniques to support accurate and high-fidelity voice replication. Work with large-scale and diverse audio datasets, including preprocessing, segmentation, normalization, and data augmentation to improve model generalization and robustness. Collaborate closely with data scientists, ML engineers, and product teams to integrate developed voice models into production pipelines. Fine-tune neural vocoders and synthesis architectures for better voice naturalness and emotional range. Stay current with the latest advancements in speech processing, AI voice synthesis, and deep generative models through academic literature and open-source projects. Contribute to the development of tools and APIs for deploying models on cloud and edge environments with high efficiency and low latency. Required Skills Strong understanding of speech signal processing, speech synthesis, and automatic speech recognition (ASR) systems. Hands-on experience with voice cloning frameworks such as Descript Overdub, Coqui TTS, SV2TTS, Tacotron, FastSpeech, or similar. Proficiency in Python and deep learning frameworks like PyTorch or TensorFlow. Experience working with speech libraries and toolkits such as ESPnet, Kaldi, Librosa, or SpeechBrain. In-depth knowledge of mel spectrograms, vocoder architectures (e.g., WaveNet, HiFi-GAN, WaveGlow), and their role in speech synthesis. Familiarity with REST APIs, model deployment, and cloud-based inference systems using platforms like AWS, Azure, or GCP. Ability to optimize models for performance in real-time or low-latency environments. Preferred Qualifications Experience in real-time voice transformation, including pitch shifting, timing modification, or emotion modulation. Exposure to emotion-aware speech synthesis, multilingual voice models, or prosody modeling. Design, develop, and optimize advanced deep learning models for voice cloning, text-to-speech (TTS), voice conversion, and real-time voice transformation Background in audio DSP (Digital Signal Processing) and speech analysis techniques. Previous contributions to open-source speech AI projects or publications in relevant domains. Why Join Us You will be part of a fast-moving, collaborative team working at the forefront of voice AI innovation. This role offers the opportunity to make a significant impact on products that reach millions of users, helping to shape the future of interactive voice experiences. Skills: automatic speech recognition (asr),vocoder architectures,voice cloning,voice processing,data,real-time voice transformation,speech synthesis,pytorch,tensorflow,voice conversion,speech signal processing,audio dsp,rest apis,python,cloud deployment,transformation,mel spectrograms,deep learning

Posted 1 month ago

Apply

0 years

0 Lacs

Chennai, Tamil Nadu, India

On-site

Roles & Responsibilities Design, implement, and train deep learning models for: Text-to-Speech (e.g., SpeechT5, StyleTTS2, YourTTS, XTTS-v2 or similar models) Voice Cloning with speaker embeddings (x-vectors, d-vectors), few-shot adaptation, prosody and emotion transfer Engineer multilingual audio-text preprocessing pipelines: Text normalization, grapheme-to-phoneme (G2P) conversion, Unicode normalization (NFC/NFD) Silence trimming, VAD-based audio segmentation, audio enhancement for noisy corpora, speech prosody modification and waveform manipulation Build scalable data loaders using PyTorch for: Large-scale, multi-speaker datasets with variable-length sequences and chunked streaming Extract and process acoustic features: Log-mel spectrograms, pitch contours, MFCCs, energy, speaker embeddings Optimize training using: Mixed precision (FP16/BFloat16), gradient checkpointing, label smoothing, quantization-aware training Build serving infrastructure for inference using: TorchServe, ONNX Runtime, Triton Inference Server, FastAPI (for REST endpoints), including batch and real-time modes Optimize models for production: Quantization, model pruning, ONNX conversion, parallel decoding, GPU/CPU memory profiling Create automated and human evaluation logics: MOS, PESQ, STOI, BLEU, WER/CER, multi-speaker test sets, multilingual subjective listening tests Implement ethical deployment safeguards: Digital watermarking, impersonation detection, and voice verification for cloned speech Conduct literature reviews and reproduce state-of-the-art papers; adapt and improve on open benchmarks Mentor junior contributors, review code, and maintain shared research and model repositories Collaborate across teams (MLOps, backend, product, linguists) to translate research into deployable, user-facing solutions Required Skills Advanced proficiency in Python and PyTorch (TensorFlow a plus) Strong grasp of deep learning concepts: Sequence-to-sequence models, Transformers, autoregressive and non-autoregressive decoders, attention mechanisms, VAEs, GANs Experience with modern speech processing toolkits: ESPnet, NVIDIA NeMo, Coqui TTS, OpenSeq2Seq, or equivalent Design custom loss function for custom models based on: Mel loss, GAN loss, KL divergence, attention losses, etc.,, learning rate schedules, training stability Hands-on experience with multilingual and low-resource language modeling Understanding of transformer architecture, LLMs and working with existing AI models, tools and APIs Model serving & API integration: TorchServe, FastAPI, Docker, ONNX Runtime Preferred (Bonus) Skills CUDA kernel optimization, custom GPU operations, memory footprint profiling Experience deploying on AWS/GCP with GPU acceleration Experience developing RESTful APIs for real-time TTS/voice cloning endpoints Publications or open-source contributions in TTS, ASR, or speech processing Working knowledge of multilingual translation pipelines Knowledge of speaker diarization, voice anonymization, and speech synthesis for agglutinative/morphologically rich languages Milestones & Expectations (First 3–6 Months) Deliver at least one production-ready TTS or Voice Cloning model integrated with India Speaks’ Dubbing Studio or SaaS APIs Create a fully reproducible experiment pipeline for multilingual speech modeling, complete with model cards and performance benchmarks Contribute to custom evaluation tools for measuring quality across Indian languages Deploy optimized models to live staging environments using Triton, TorchServe, or ONNX Demonstrate impact through real-world integration in education, media, or defence deployments

Posted 1 month ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies