Posted:3 days ago|
Platform:
On-site
Contractual
Location: Gurgaon, India
Type: Full-time Internship (6–12 months)
Who: Final-year engineering students or recent graduates passionate about AI/ML in speech
At JoshTalks AI Lab, we believe that voice will be the primary medium of interaction between man and machine. Our mission is simple yet ambitious:
● Help machines talk like humans.
● Build the benchmarks and datasets that become the backbone of global progress in speech AI.
● Drive improvements not just through compute or algorithms — but through high-quality, diverse, real-world data.
Our datasets today power some of the largest and most widely used speech models in the world (you’ve definitely used them, even if we can’t name them 😉).
This is not a “just another internship.” You’ll be directly contributing to the global race to perfect speech AI:
1. Benchmarking the world’s speech models
● Design and run evaluations for ASR and speech-to-speech systems.
● Create benchmarks that will guide top AI labs on where their models fail and where they shine.
2. Modeling & Fine-Tuning
● Fine-tune speech recognition systems (like Whisper/wav2vec2) to push Word Error Rates toward ~5%.
● Experiment with multilingual, code-switched, and noisy speech to mimic real-world conditions.
3. Impact at Scale
● Your work won’t just sit in a paper. It will influence how the world’s largest AI models get built, tested, and improved.
● Final-year undergraduates (B.Tech/B.E.) in CSE, EE, AI/ML, or related fields.
● Strong interest in speech, audio, NLP, or multimodal AI.
● Hands-on experience in one or more of:
● Fine-tuning speech or language models (Whisper, wav2vec2, HuBERT, SER, etc.)
● Building speech-driven projects (assistants, classifiers, chatbots, SER systems)
● Working with PyTorch, TensorFlow, or Hugging Face transformers.
● Bonus: past projects on GitHub, Kaggle, or research papers.
● Ownership: Even as a final-year student, you’ll get the chance to own problems of global importance — from reducing ASR word error rates toward 5% to building benchmarks that influence how the next generation of
speech-to-speech models are developed. These are not side projects: the problems you’ll work on may define how billions of people interact with machines in the future.
● Front-row seat in speech AI: Your work will shape benchmarks and datasets used by the world’s top model labs.
● Learning: Work with experts solving speech challenges across 20+ Indian languages and noisy, real-world audio.
● Impactful projects: The benchmarks and models you help build will set direction for global AI progress.
● Startup energy, global scale: Small team, big impact — perfect for ambitious builders.
● Co-Authorship: If any of the work you contribute to is published as a paper, benchmark report, or dataset release, you will be credited as a co-author. This means your contributions won’t just stay inside the lab — they’ll be visible to the wider research community and part of the academic and industry record.
● Location: Gurgaon (on-site preferred for collaboration)
● Duration: 6–12 months
● Type: Paid Internship (full-time)
● Start Date: Flexible for final-year students (aligns with academic calendar)
If you’re someone who dreams of making speech AI as natural as human conversation, this is your chance to work on the real frontier. Super interested? You can also directly write to our founder Shobhit at shobhit@joshtalks.com
To Apply write to hr@joshtalks.com
Josh Talks
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
gurugram, haryana, india
Experience: Not specified
Salary: Not disclosed
gurugram, haryana, india
Experience: Not specified
Salary: Not disclosed
gurugram, haryana, india
Experience: Not specified
Salary: Not disclosed
gurugram, haryana, india
Experience: Not specified
Salary: Not disclosed