Jobs
Interviews

7 Whisper Jobs

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

5.0 - 10.0 years

5 - 10 Lacs

Bengaluru, Karnataka, India

On-site

What will you do Voice AI Stack Ownership: Build and own the end-to-end voice bot pipeline ASR, NLU, dialog state management, tool calling, and TTS to create a natural, human-like conversation experience. LLM Orchestration & Tooling: Architect systems using MCP (Model Context Protocol) to mediate structured context between real-time ASR, memory, APIs, and the LLM. RAG Integration: Implement retrieval-augmented generation to ground responses using dealership knowledge bases, inventory data, recall lookups, and FAQs. Vector Store & Memory: Design scalable vector-based search for dynamic FAQ handling, call recall, and user-specific memory embedding. Latency Optimization: Engineer low-latency, streaming ASR + TTS pipelines and fine-tune turn-taking models for natural conversation. Model Tuning & Hallucination Control: Use fine-tuning, LoRA, or instruction tuning to customize tone, reduce hallucinations, and align responses to business goals. Instrumentation & QA Looping: Build robust observability, run real-time call QA pipelines, and analyze interruptions, hallucinations, and fallbacks. Cross-functional Collaboration: Work closely with product, infra, and leadership to scale this bot to thousands of US dealerships. What will make you successful in this role Architect-level thinking: You understand how ASR, LLMs, memory, and tools fit together and can design modular, observable, and resilient systems. LLM Tooling Mastery: You've implemented tool calling, retrieval pipelines, function calls, or prompt chaining across multiple workflows. Fluency in Vector Search & RAG: You know how to chunk, embed, index, and retrieve and how to avoid prompt bloat and token overflow. Latency-First Mindset: You debug token delays, know the cost of each API hop, and can optimize round-trip time to keep calls human-like. Grounding > Hallucination: You know how to trace hallucinations back to weak prompts, missing guardrails, or lack of tool access and fix them. Prototyper at heart: You're not scared of building from scratch and iterating fast, using open-source or hosted tools as needed. What you must have 5+ years in AI/ML or voice/NLP systems with real-time experience Deep knowledge of LLM orchestration, RAG, vector search, and prompt engineering Experience with MCP-style architectures or structured context pipelines between LLMs and APIs/tools Experience integrating ASR (Whisper/Deepgram), TTS (ElevenLabs/Coqui), and OpenAI/GPT-style models Solid understanding of latency optimization, streaming inference, and real-time audio pipelines Hands-on with Python, FastAPI, vector DBs (Pinecone, Weaviate, FAISS), and cloud infra (AWS/GCP) Strong debugging, logging, and QA instincts for hallucination, grounding, and UX behavior

Posted 1 month ago

Apply

5.0 - 9.0 years

0 Lacs

pune, maharashtra

On-site

As a GenAI Developer at Vipracube Tech Solutions, you will be responsible for developing and optimizing AI models, implementing AI algorithms, collaborating with cross-functional teams, conducting research on emerging AI technologies, and deploying AI solutions. This full-time role requires 5 to 6 years of experience and is based in Pune, with the flexibility of some work from home. Your key responsibilities will include fine-tuning large language models tailored to marketing and operational use cases, building Generative AI solutions using various platforms like OpenAI (GPT, DALLE, Whisper) and Agentic AI platforms such as LangGraph and AWS Bedrock. You will also be building robust pipelines using Python, NumPy, Pandas, applying traditional ML techniques, handling CI/CD & MLOps, using AWS Cloud Services, collaborating using tools like Cursor, and effectively communicating with stakeholders and clients. To excel in this role, you should have 5+ years of relevant AI/ML development experience, a strong portfolio of AI projects in marketing or operations domains, and a proven ability to work independently and meet deadlines. Join our dynamic team and contribute to creating smart, efficient, and future-ready digital products for businesses and startups.,

Posted 1 month ago

Apply

6.0 - 10.0 years

6 - 10 Lacs

Bengaluru, Karnataka, India

On-site

What You ll Do Design & Build: Develop mutli-agent AI systems for the UCaaS platform, focusing on NLP, speech recognition, audio intelligence and LLM powered interactions. Rapid Experiments: Prototype with open-weight models (Mistral, LLaMA, Whisper, etc.) and scale what works. Code for Excellence: Write robust code for AI/ML libraries and champion software best practices. Optimize for Scale & Cost: Engineer scalable AI pipelines, focusing on latency, throughput, and cloud costs. Innovate with LLMs: Fine-tune and deploy LLMs for summarization, sentiment and intent detection, RAG pipelines, multi-modal inputs and multi-agentic task automation. Own the Stack: Lead multi-agentic environments from data to deployment and scale. Collaborate & Lead: Integrate AI with cross-functional teams and mentor junior engineers. What You Bring Experience:6-10 yearsof professional experience, with a mandatory minimum of 2 years dedicated to a hands-on role in a real-world, production-level AI/ML project. Coding & Design: Expert-level programming skills inPythonand proficiency in designing and building scalable, distributed systems. ML/AI Expertise: Deep, hands-on experience with coreML/AI libraries and frameworks, Agentic Systems, RAG pipelines Hands-on experience in usingVector DBs LLM Proficiency: Proven experience working with and fine-tuning Large Language Models (LLMs). Scalability & Optimization Mindset: Demonstrated experience in building and scaling AI services in the cloud, with a strong focus on performance tuning and cost optimization of agents specifically. Nice to Have Youve tried outagent frameworkslike LangGraph, CrewAI, or AutoGen and can explain the pros and cons of autonomous vs. orchestrated agents. Experience with MLOps tools and platforms (e.g., Kubeflow, MLflow, Sagemaker). Real-time streaming AI experience token-level generation, WebRTC integration, or live transcription systems Contributions to open-source AI/ML projects or a strong public portfolio (GitHub, Kaggle).

Posted 1 month ago

Apply

3.0 - 7.0 years

6 - 15 Lacs

Hyderabad

Remote

Job Title : Python Developer - AI Automation Location : Remote or On-site (Preferred: Hyderabad) Employment Type : Full-time Experience Level : 3+ years Compensation : Competitive (Based on experience) Start Date : Immediate Note from Hiring Manager: We need someone with 2-3 years of python coding experience with AWS cloud/GCP/Azure work experience in recent. We have the idea, we need someone who can support us in writing the code. About the Role: We are seeking a dedicated and skilled Python Developer to join our fast-growing AI-based interview automation platform, autonomously conducts interviews over Zoom using cutting-edge technologies like OpenAI, ElevenLabs, Deepgram, and JobDiva APIs. Youll be responsible for maintaining, optimizing, and extending the backend logic of our production-grade AI orchestration system. This is a critical role where youll work directly with the product owner to continuously improve reliability, latency, and intelligence in live interview sessions. Responsibilities Maintain and enhance Python-based orchestration logic for AI-led Zoom interviews. Integrate and manage APIs for Zoom, JobDiva (ATS), OpenAI, Deepgram, ElevenLabs, and MariaDB. Implement logic for: Participant tracking (e.g., recruiter vs candidate). Speech-to-text and voice synthesis in real time. AI-based scoring, contextual response generation, and interview flow. Write modular, well-documented, testable, and scalable code. Debug production issues, monitor logs, and handle retries or fallback flows. Optimize latency, silence detection, and session reliability. Implement new features (e.g., WebRTC integration, calendar syncing, real-time scoring UI). Work with Docker and AWS for deployment and scaling. Collaborate with UI/UX and DevOps teams for a seamless experience. Required Skills Strong Python (3.9+) skills core scripting, async programming, file I/O, exception handling. Experience integrating REST APIs with requests, httpx, or aiohttp. Familiarity with Zoom API , OAuth, and meeting lifecycle handling. Experience working with OpenAI GPT , ElevenLabs , Whisper , or similar AI/LLM services. Knowledge of speech processing libraries (e.g., sounddevice, pyaudio, ffmpeg, Deepgram, Whisper). Database experience with MariaDB/MySQL using SQLAlchemy. Proficient in logging , debugging , and error tracking in production environments. Familiarity with Docker and basic AWS (EC2, Lambda, etc.). Comfortable working in Linux/macOS and managing audio devices (e.g., VB-CABLE, BlackHole). Preferred Qualifications Experience with AI workflow orchestration tools . Prior work in HR tech , video conferencing , or voice assistants . Familiarity with frontend integration (Flask/Streamlit/Gradio for internal tools). Experience with web scraping , cron jobs , or automation scripts . Why Join Us Be part of a trailblazing AI product used in live interviews globally. Shape the future of automated hiring and AI-human interactions . Flexible work environment, clear product roadmap, and rapid iteration culture. Opportunity to grow into a Tech Lead or AI Architect role as the product scales.

Posted 1 month ago

Apply

6.0 - 11.0 years

40 - 60 Lacs

Kolkata

Work from Office

We're looking for an experienced AI/ML Technical Lead to architect and drive the development of our intelligent conversation engine. Youll lead model selection, integration, training workflows (RAG/fine-tuning), and scalable deployment of natural language and voice AI components. This is a foundational hire for a technically ambitious platform. Key Responsibilities AI System Architecture: Design the architecture of the AI-powered agent including LLM-based conversation workflows, voice bots, and follow-up orchestration. Model Integration & Prompt Engineering: Leverage APIs from OpenAI, Anthropic, or deploy open models (e.g., LLaMA 3, Mistral). Implement effective prompt strategies and retrieval-augmented generation (RAG) pipelines for contextual responses. Data Pipelines & Knowledge Management: Build secure data pipelines to ingest, embed, and serve tenant-specific knowledge bases (FAQs, scripts, product docs) using vector databases (e.g., Pinecone, Weaviate). Voice & Text Interfaces: Implement and optimize multimodal agents (text + voice) using ASR (e.g., Whisper), TTS (e.g., Polly), and NLP for automated qualification and call handling. Conversational Flow Orchestration: Design dynamic, stateful conversations that can take actions (e.g., book meetings, update CRM records) using tools like LangChain, Temporal, or n8n. Platform Scalability: Ensure models and agent workflows scale across tenants with strong data isolation, caching, and secure API access. Lead a Cross-Functional Team: Collaborate with backend, frontend, and DevOps engineers to ship intelligent, production-ready features. Monitoring & Feedback Loops: Define and monitor conversation analytics (drop-offs, booking rates, escalation triggers), and create pipelines to improve AI quality continuously. Qualifications Must-Haves: 5+ years of experience in ML/AI, with at least 2 years leading conversational AI or LLM projects. Strong background in NLP, dialog systems, or voice AI preferably with production experience. Experience with OpenAI, or open-source LLMs (e.g. LLaMA, Mistral, Falcon) and orchestration tools (LangChain, etc.). Proficiency with Python and ML frameworks (Hugging Face, PyTorch, TensorFlow). Experience deploying RAG pipelines, vector DBs (e.g. Pinecone, Weaviate), and managing LLM-agent logic. Familiarity with voice processing (ASR, TTS, IVR design). Solid understanding of API-based integration and microservices. Deep care for data privacy, multi-tenancy security, and ethical AI practices. Nice-to-Haves: Experience with CRM ecosystems (e.g. Salesforce, HubSpot) and how AI agents sync actions to CRMs. Knowledge of sales pipelines and marketing automation tools. Exposure to calendar integrations (Google Calendar API, Microsoft Graph). Knowledge of Twilio APIs (SMS, Voice, WhatsApp) and channel orchestration logic. Familiarity with Docker, Kubernetes, CI/CD, and scalable cloud infrastructure (AWS/GCP/Azure). What We Offer Founding team role with strong ownership and autonomy Opportunity to shape the future of AI-powered sales Flexible work environment Competitive salary Access to cutting-edge AI tools and training resources Post your resume and any relevant project links (GitHub, blog, portfolio) to career@sourcedeskglobal.com. Include a short note on your most interesting AI project or voicebot/conversational AI experience.

Posted 2 months ago

Apply

2.0 - 5.0 years

10 - 20 Lacs

Noida

Work from Office

What would you do? System Design: Architect and design end-to-end speech processing pipelines, from data acquisition to model deployment. Ensure systems are scalable, efficient, and maintainable. Advanced Modeling: Develop and implement advanced machine learning models for speech recognition, speaker diarization, and related tasks. Utilize state-of-the-art techniques such as deep learning, transfer learning, and ensemble methods. Research and Development: Conduct research to explore new methodologies and tools in the field of speech processing. Publish findings and present at industry conferences. Performance Optimization: Continuously monitor and optimize system performance, focusing on accuracy, latency, and resource utilization. Collaboration: Work closely with product management, data science, and software engineering teams to define project requirements and deliver innovative solutions. Customer Interaction: Engage with customers to understand their needs and provide tailored speech solutions. Assist in troubleshooting and optimizing deployed systems. Documentation and Standards: Establish and enforce best practices for code quality, documentation, and model management within the team. Required Skills 2+ years of experience in speech processing, machine learning, and model deployment. Demonstrated expertise in leading projects and teams. Technical skills: • Excellent knowledge in Python / Java programming. • In-depth knowledge of speech processing frameworks like, Wave2vec, Kaldi, HTK, DeepSpeech and Whisper. • Experience with NLP, STT, Speech to Speech LLMs and frameworks like Nvidia NEMO, PyAnnote. • Proficiency in Python and machine learning libraries such as TensorFlow, PyTorch, or Keras. • Experience with large-scale ASR systems, speaker recognition, and diarization algorithms. • Strong understanding of neural networks, sequence-to-sequence models, transformers and attention mechanisms. • Familiarity with NLP techniques and their integration with speech systems. • Expertise in deploying models on cloud platforms and optimizing for real-time applications. Good to have: • Experience with low-latency streaming ASR systems. • Knowledge of speech synthesis, STT (Speech-to-Text) and TTS (Text-to-Speech) systems. • Experience in multilingual and low-resource speech processing.

Posted 2 months ago

Apply

2 - 5 years

4 - 9 Lacs

Bengaluru

Work from Office

Role & responsibilities Build Audio ML/DL models for given requirement Build models from scratch wherever required Must be able to code relevant research papers independently Tune to improve accuracy and latency Support to satisfy the deployment requirements of the product Apply transfer learning wherever required Training, fine tuning and optimization of different flavors of transformer models Hands on experience with LLM, fine tuning and Optimization of LLM Audio AI Modelling and tuning experience Audio AI concepts, ML/DL working experience in Pytorch/Tensorflow/Keras framework Variations of CNN, RNN, Attention Mechanism, Transformers and large models like wav2vec, whisper Proficiency in machine learning/audio pre-processing libraries/frameworks such as Librosa, Kaldi, sklearn, python speech features, Numpy, Scipy, Pandas, Matplotlib. Signal Processing fundamentals, understanding the data characteristics and wave signals Transfer Learning Experience in Acoustic models for speech recognition systems. Experience in Voice activity detection with noise analysis & denoising Experience in WakeupWord, ASR Working experience of LLM fine tuning, optimization and performance improvement. Audio AI Audio related certification Tools for creating Synthetic data Experience working in Edge devices Experience in noise augmentation and mixing techniques

Posted 3 months ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies