Lead Research Scientist, Speech and Audio Foundation Models

5 - 9 years

0 Lacs

Posted:5 days ago| Platform: Shine logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

As a highly skilled and experienced Senior Research Lead for Speech, Audio, and Conversational AI at Krutrim, your role involves spearheading research and development in cutting-edge technologies related to speech processing, text-to-speech (TTS), audio analysis, and real-time conversational AI. Collaborating with a team of talented engineers and researchers, you will design, implement, and optimize state-of-the-art systems to enhance the quality of speech and audio solutions for various applications. Key Responsibilities: - Bring advanced Audio Language Models and Speech Language Models by leveraging the state of the art in Audio/Speech and Large Language Models. - Research, architect, and deploy new generative AI methods like autoregressive models, causal models, and diffusion models. - Develop low-latency end-to-end models using multilingual speech/audio as input and output. - Evaluate and enhance model performance focusing on accuracy, naturalness, efficiency, and real-time capabilities across multiple languages through experiments. - Remain updated with the latest advancements in speech processing, audio analysis, and large language models to incorporate new techniques into foundation models. - Collaborate with cross-functional teams to integrate these models into Krutrim's AI stack and products. - Publish research findings in esteemed conferences and journals such as INTERSPEECH, ICASSP, ICLR, ICML, NeurIPS, and IEEE/ACM Transactions on Audio, Speech, and Language Processing. - Mentor junior researchers and engineers to foster a collaborative and innovative team environment. - Drive the adoption of best practices in model development, including testing, documentation, and ethical considerations in multilingual AI. Qualifications: - Ph.D. with 5+ years or MS with 8+ years experience in Computer Science, Electrical Engineering, or related field focusing on speech processing, audio analysis, and machine learning. - Proficiency in training or finetuning speech/audio models for representation, generation, and multilingual multitask models. - Expertise with Audio Language Models like AudioPALM, Moshi, and Seamless M4T. - Demonstrated track record in developing novel neural network architectures such as Transformers, Mixture of Experts, Diffusion Models, and State Space Machines. - Extensive experience in optimizing models for low-latency, real-time applications. - Strong background in multilingual speech recognition and synthesis with an understanding of language-specific challenges. - Proficiency in deep learning frameworks like TensorFlow, PyTorch, and deploying large-scale speech and audio models. - Expertise in high-performance computing with proficiency in Python, C/C++, CUDA, and kernel-level programming for AI applications. - Experience with audio signal processing techniques and their application in end-to-end neural models. - Strong publication record in top AI conferences and journals, particularly focusing on speech, audio, and language models. - Excellent communication skills to convey complex technical concepts to technical and non-technical audiences.,

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You