Voice AI Engineer

0 - 4 years

2 - 14 Lacs

Posted:1 day ago| Platform: Indeed logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Job Title: Voice AI Engineer

Location: Gurgaon
Company: Insybit
Product: Oktivo
Experience: 1–4 years
Type: Full-time

Role Overview

We’re looking for a Voice AI Engineer who can design and scale Oktivo’s voice intelligence stack — from real-time speech pipelines to contextual LLM reasoning and telephony orchestration.
You’ll help us move from API dependency to proprietary intelligence, shaping how our AI agents listen, learn, and act in live customer conversations.

This is a founding-level engineering role — you’ll work directly with the founder and own the technical roadmap for voice orchestration, contextual memory, and RAG-powered decision-making.

How to Apply

Apply link: https://forms.gle/W7DgHHJaqgLxbZQw7

or alternative send an email to contact@insybit.com

Subject: Voice AI Engineer — [Your Name]
Include: LinkedIn / GitHub / Portfolio / Demo of any voice or LLM-based project.

Key Responsibilities

  • Architect, build, and optimize the Oktivo Voice AI pipeline (STT → LLM → TTS) for real-time conversational performance.
  • Integrate and fine-tune speech and language models (Deepgram, ElevenLabs, Whisper, Gemini, Llama 3, etc.) for low-latency inference.
  • Develop RAG pipelines using vector databases (Pinecone / Weaviate / FAISS) to connect call conversations, CRM data, and marketing intelligence in real time.
  • Design Oktivo’s Context Memory Layer — persistent graph that stores conversation embeddings, user behavior, and campaign sources for contextual recall.
  • Train, fine-tune, and evaluate custom LLMs (domain-specific) for voice intent recognition, tone analysis, and sales conversation optimization.
  • Integrate with telephony systems (Twilio, Exotel, Kaleyra, SIP/WebRTC) and build scalable real-time call flows using LiveKit or WebRTC.
  • Collaborate with backend engineers to sync conversational data with CRMs (Zoho, HubSpot, Salesforce) and marketing attribution systems.
  • Optimize inference cost, caching, and latency for high throughput and enterprise-grade reliability.
  • Continuously test and improve voice quality, emotion control, and multilingual naturalness.

Required Skills

  • 3+ years of experience in Speech AI / Conversational AI / Agentic Systems.
  • Strong experience with STT/TTS pipelines — Deepgram, Whisper, ElevenLabs, Azure Speech, or custom acoustic models.
  • Hands-on with RAG frameworks (LangChain, LlamaIndex) and vector databases (Pinecone, Weaviate, FAISS).
  • Proficient with Python (FastAPI, LangChain) or Node.js for AI pipeline orchestration.
  • Solid understanding of LLMs (OpenAI, Gemini, Claude, Llama 3) and experience training or fine-tuning domain-specific models.
  • Experience integrating telephony systems (Twilio, Exotel, SIP/WebRTC/LiveKit).
  • Experience in real-time streaming and low-latency architecture for conversational AI.
  • Familiarity with cloud infrastructure (GCP / AWS) and GPU-based inference.

Bonus Skills

  • Prior work on voice bot orchestration or AI-powered contact center automation.
  • Understanding of CRM data models, marketing attribution, or customer journey analytics.
  • Knowledge of multi-turn dialog state management or reinforcement learning for conversational flow.
  • Experience building offline caching or edge inference pipelines for cost optimization.

What You’ll Build

  • Oktivo Agentic Core: Context Memory + Decision Graph that powers our voice AI.
  • Proprietary RAG Engine: Dynamic retrieval across CRM, marketing, and conversation datasets.
  • Voice AI Telephony Layer: AI agent stack that connects to real users through phone calls, WhatsApp, and other channels.

Why Join Oktivo

  • Be part of the founding tech team building India’s first Voice-First Agentic CRM platform.
  • Solve cutting-edge challenges in real-time speech + reasoning AI.
  • Work with large enterprise clients in BFSI, Real Estate, and E-commerce.
  • Own key IP in contextual memory, agentic actions, and voice orchestration.
  • Competitive salary + ESOPs + mentorship from leading MarTech & AI advisors.

Tech Stack

AI: Deepgram • ElevenLabs • Whisper • Gemini • Llama 3 • LangChain • LlamaIndex
Infra: Python • Node.js • FastAPI • GCP • Firebase • Postgres • Pinecone • Redis
Telephony: Twilio • Exotel • WebRTC • LiveKit • SIP
CRM Integrations: Salesforce • Zoho • HubSpot • Freshworks

Job Type: Full-time

Pay: ₹223,456.41 - ₹1,400,000.00 per year

Application Question(s):

  • What's your notice period? If already serving mention your last date
  • What's your current CTC?

*
What's your expected CTC?

  • Have you worked on Voice AI ( building TTS, STT Engine)

Location:

  • Gurugram, Haryana (Preferred)

Work Location: In person

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You

Pune, Maharashtra, India