Voice AI & Real-Time Communication Engineer (Junior to Senior)
Aivar Innovations
Location:
Job Overview
Aivar Innovations is seeking Voice AI and Real-Time Communication Engineers/Architects to join our innovative team, focusing on building production-grade conversational AI systems and voice automation solutions. This role involves designing and implementing state-of-the-art voice agents that handle real-time audio processing, speech recognition, synthesis, and natural conversations at scale.
As part of the Aivar Innovations team, candidates will architect voice systems capable of powering thousands of conversations daily, maintaining sub-second latency, and gracefully handling production edge cases across diverse industry verticals.
Key Responsibilities
- Design, develop, and deploy production-grade voice AI agents using frameworks like
Pipecat
and LiveKit
[ - Build and optimize real-time voice processing pipelines with
Speech-to-Text (STT)
and Text-to-Speech (TTS)
technologies - Implement
speaker diarization
systems to identify and segment multiple speakers in conversations - Develop voice communication infrastructure using
WebRTC
, WebSocket
, SIP
, and RTP
protocols - Integrate voice agents with telephony systems (Twilio, Telnyx) for inbound and outbound calling
- Architect low-latency, high-availability voice systems handling thousands of concurrent calls
- Build voice orchestration layers connecting STT, LLMs, and TTS with minimal latency
- Implement
Voice Activity Detection (VAD),
echo cancellation, and audio processing optimizations - Deploy and monitor voice AI applications in production cloud environments
- Collaborate with product and engineering teams to define voice AI use cases and implement solutions
Required Technical Skills
Voice AI & Conversational Systems
- Hands-on experience building voice agents using
Pipecat
, LiveKit
, or similar frameworks - Deep expertise in
Speech-to-Text (STT)
systems: Deepgram, Whisper, AssemblyAI, Google STT, Azure Speech - Proficiency with
Text-to-Speech (TTS)
platforms: ElevenLabs, Cartesia, Amazon Polly, Azure TTS - Experience with
speaker diarization
and utterance segmentation for multi-speaker scenarios - Knowledge of voice agent orchestration platforms (VAPI, Retell) and custom implementations
Real-Time Communication Protocols
- Strong understanding of
WebRTC
architecture including ICE, STUN, TURN, SRTP - Experience with
SIP (Session Initiation Protocol)
and RTP/RTCP
for VoIP systems - Proficiency in
WebSocket
communication for real-time bidirectional data transfer - Knowledge of telephony integration, call routing logic, and media servers (FreeSWITCH, Asterisk)
Audio Processing & Media
- Experience with audio codecs (Opus, G.711, G.729) and media streaming protocols
- Understanding of
Voice Activity Detection (VAD)
, echo cancellation, and noise suppression - Knowledge of audio processing pipelines and real-time media handling[
LLM Integration & AI
- Deep knowledge of
Large Language Models (LLMs)
and optimization for low-latency responses - Experience integrating conversational AI with voice pipelines (GPT-4o, Claude, etc.)
- Prompt engineering and conversation design for natural voice interactions
Programming & Development
- Strong programming skills in
Python
(primary), TypeScript
, Node.js
, or Golang
- Proficiency with AI/ML frameworks:
TensorFlow
, PyTorch
, Scikit-learn
- Experience with real-time streaming systems and distributed architectures
Cloud & Infrastructure (Preferred)
AWS Services:
Lambda, EC2, EKS, S3, DynamoDB, Amazon Bedrock, Polly, Transcribe- Knowledge of containerization (Docker, Kubernetes) and CI/CD pipelines
- Experience deploying voice AI systems in production with monitoring and observability
Required Qualifications
- Specialized experience building
production voice AI systems
handling real customers at scale - Demonstrated track record with
conversational AI
, voice agents
, and real-time communication
- Portfolio showcasing voice AI implementations, including latency optimization and call handling
- Experience maintaining sub-second latency and handling edge cases in production voice systems
Preferred Add-Ons
- Experience with low-code voice platforms (VAPI, Retell) and custom infrastructure development
- Knowledge of media servers and SFU/MCU architectures for scalable voice systems
- Familiarity with Indian language models for STT/TTS applications
- Experience with voice analytics, call transcription, and quality monitoring systems
- Background in VoIP development with Asterisk, FreeSWITCH, Kamailio, or OpenSIPS
- Certifications in speech technologies or cloud platforms (AWS, Azure)
Preferred AWS Certifications
While AWS certifications are not mandatory initially, candidates possessing relevant certifications will be given preference:
- AWS Certified Solutions Architect Associate
- AWS Certified Machine Learning Specialty
- AWS Certified Solutions Architect Professional
- AWS Certified AI Practitioner
Essential Soft Skills
Exceptional Communication:
Ability to present complex voice AI architectures to technical and non-technical stakeholdersCollaborative Leadership:
Proven experience working with cross-functional teams including product, operations, and customer-facing rolesInnovative Problem-Solving:
Demonstrated ability to tackle production voice system challenges with creative solutionsBias Toward Action:
Ship daily, measure success by business impact, and iterate rapidlyAdaptability:
Capacity to learn emerging voice AI technologies and frameworks in fast-paced environments