We are looking for an experienced AI Engineer (Voice Agents) who has designed, built, and deployed production-grade AI voice agents not just prototypes. You’ll join a fast-moving, product-led team where ideas go from MVP to production in weeks, not quarters.
This is a hands-on role for someone who enjoys owning the full lifecycle architecture, modeling, evaluation, deployment, monitoring, and iteration.
What You’ll Do
1. Build & Ship Voice Agents
- Design, build, and deploy inbound and outbound voice agents for real-world business workflows.
- Own E2E pipeline — speech-to-text, LLM reasoning, prompt strategy, dialog state management, TTS, telephony integration.
- Rapidly iterate MVPs and harden them to production quality with robust SLAs.
2. Engineering & Architecture
- Architect scalable agentic systems using modern tooling (OpenAI, Anthropic, DeepSeek, Inspector, Whisper, PlayHT, ElevenLabs, PolyAI-style dialog).
- Implement retrieval (RAG), vector stores, and memory systems to boost accuracy in complex tasks.
- Integrate voice agents with CRM, ticketing, telephony, and workflow orchestration systems.
3. Production & Reliability
- Set up observability: monitoring, logging, red-teaming, evals, hallucination detection.
- Drive latency reduction, call-success metrics, correctness of agent actions.
- Run A/B tests, fine-tuning, guardrails, fallback flows.
4. Collaboration & Product Work
- Work closely with product, engineering, and domain stakeholders to define agent behavior.
- Translate business processes into voice agent flows that operate with high autonomy.
- Contribute to internal frameworks and reusable components for deploying voice agents faster.
What You Bring
- 4–7 years of hands-on AI/ML engineering experience.
- At least one real, production-deployed AI voice agent (inbound or outbound) that handled real customer calls.
- Deep expertise in:
- Speech-to-Text (Whisper, AssemblyAI, Deepgram, Google STT)
- LLMs (OpenAI, Anthropic, Llama, Gemini, DeepSeek)
- Text-to-Speech (ElevenLabs, PlayHT, Azure TTS)
- Telephony (Twilio, SIP, WebRTC, Asterisk, etc.)
- RAG, embeddings, vector databases
- Python/Node JS for AI agent orchestration
- Experience deploying production microservices (Docker, Kubernetes, CI/CD).
- Strong latency optimization + debugging skills — especially for real-time audio.
- Ability to work in fast environments: MVP in weeks → robust production in months.
Nice to Have
- Experience with vendor ecosystems like ElevenLabs, PolyAI, Sierra, Decagon, Replicant.
- Experience building frameworks for agent evaluation and “reasoning trace” inspection.
- Background in conversation design or dialog management.
- Experience with reinforcement learning for dialog optimization.
Success Looks Like
- In the first 2–4 months you will:
- Deliver at least one production-ready voice agent handling real customer traffic.
- Build internal tooling that reduces time to deploy new agents.
- Improve latency, call success rate, and downstream business outcomes.
- Establish metrics, dashboards, and evaluation tooling for all voice agents.
- Become the go-to owner for AI voice agent architecture and innovation.
_ Why Join Us_
- Work directly with experienced founders and a high-caliber product/engineering team.
- Own end-to-end systems with massive autonomy.
- Ship real products that touch thousands of customers.
- Opportunity to shape core AI infrastructure and voice agent strategy.
- Competitive compensation + high-growth environment.
How to Apply
Send your GitHub/portfolio, past projects, and a brief note on the voice agents you’ve deployed to: jobs@vikara.ai
Required: Please create your profile on (Aithors.ai)
Job Type: Full-time
Pay: ₹2,200,000.00 - ₹3,500,000.00 per year
Benefits: