Lead Assistant Manager

3 years

0 Lacs

Posted:1 week ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Senior ASR/TTS Specialist - AI Agent Integration Expert

Company:

EXL Service

Type:

Full-time

Experience:

3+ years

Position Summary

We seek an exceptional

Senior ASR/TTS Specialist

to lead speech AI initiatives and integrate advanced speech technologies with AI agent frameworks. This role focuses on fine-tuning ASR/TTS models, implementing MLOps best practices, and building production-ready speech AI systems powering next-generation conversational AI agents.

Key Responsibilities

Speech AI Model Development & Integration

Model Fine-tuning : Customize state-of-the-art ASR/TTS models for domain-specific applications with <300ms latency
  • Speech-to-Speech Systems : Build end-to-end S2S pipelines using Amazon Nova Sonic v1.0, Azure OpenAI Realtime (GPT-4o), and Gemini 2.5 Flash Native Audio
  • Multi-modal Integration : Develop speech models integrating with vision and text modalities in AI agents
  • Agent Framework Integration : Implement speech capabilities with LangChain/LangGraph, CrewAI, AutoGen, LlamaIndex, and OpenAI Assistants API
  • MLOps & Production Engineering

    • Model Lifecycle : Implement comprehensive MLOps pipelines using MLflow, Weights & Biases, and automated CI/CD
    • Multi-cloud Deployment : Deploy speech models across AWS Bedrock, Google Cloud AI, and Azure Cognitive Services
    • Real-time Processing : Build WebSocket-based streaming audio systems handling 1000+ concurrent connections
    • Production Monitoring : Implement WER tracking, latency monitoring, and multi-provider failover mechanisms

    Research & Development

    • Cutting-edge Research : Stay current with latest speech AI breakthroughs and implement novel architectures
    • Performance Optimization : Optimize models for real-time inference using TensorRT, ONNX, and edge deployment
    • Data Pipeline Engineering : Build scalable audio ingestion, preprocessing, and augmentation systems

    Required Qualifications

    Core Technical Skills (Must-Have)

    Speech AI Models (3+ years experience):

    -

    ASR Systems

    : Amazon Nova Sonic v1.0, Google Speech-to-Text, Azure Speech Services, Whisper, Wav2Vec2, Riva -

    TTS Systems

    : Google TTS, Azure Cognitive Services TTS, ElevenLabs (REST/WebSocket), Tortoise, VITS, FastSpeech2 -

    Speech-to-Speech

    : Direct S2S without intermediate text, multimodal audio processing -

    Cloud Services

    : AWS Bedrock Runtime, Google Cloud AI (Gemini API), Azure OpenAI Services

    Programming & Frameworks:

    -

    Languages

    : Expert Python, proficient C++/Rust for optimization -

    ML Frameworks

    : Advanced PyTorch, TensorFlow 2.x, JAX/Flax -

    Audio Processing

    : librosa, torchaudio, soundfile, WebRTC, µ-law/PCM conversion -

    Agent Frameworks

    : Hands-on experience with 3+ of: LangChain, CrewAI, AutoGen, LlamaIndex, OpenAI Assistants

    MLOps & Infrastructure (Essential)

    MLOps Tools (2+ years):

    -

    Experiment Management

    : MLflow, Weights & Biases -

    Model Serving

    : TorchServe, TensorFlow Serving, NVIDIA Triton -

    Workflow Orchestration

    : Apache Airflow, Kubeflow, Prefect -

    Containerization

    : Docker, Kubernetes for speech model deployment

    Cloud & Production:

    -

    Multi-cloud Experience

    : AWS (Bedrock, Nova Sonic), Google Cloud (Gemini, Speech APIs), Azure (OpenAI Services) -

    Real-time Systems

    : Sub-300ms latency, WebSocket architecture, telecom integration (Genesys AudioConnector) -

    Monitoring

    : Audio quality metrics, model drift detection, production reliability (99.9% uptime)

    Preferred QualificationsAdvanced Specializations

    • Multi-lingual Processing : Cross-lingual transfer learning, zero-shot adaptation
    • Domain Expertise : Healthcare, legal, technical domain speech AI
    • Edge AI : TensorRT, Core ML, ONNX optimization for mobile/edge deployment
    • Research Background : Publications in ICASSP, INTERSPEECH, ICML, NeurIPS

    Leadership & Education

    • Team Leadership : Experience leading speech AI teams and technical initiatives
    • Education : MS/PhD in Computer Science, Electrical Engineering, or related field
    • Open Source : Contributions to speech AI libraries and frameworks

    Technical Environment

    Production Technology Stack

    Core Technologies:

    -

    Languages

    : Python, C++, Rust, TypeScript -

    Frameworks

    : PyTorch, TensorFlow, JAX, LangChain, CrewAI, AutoGen -

    Cloud Services

    : AWS Bedrock, Google Cloud AI, Azure OpenAI Services -

    Audio Tools

    : librosa, torchaudio, WebRTC, FFmpeg -

    MLOps

    : MLflow, Kubeflow, Docker, Kubernetes, NVIDIA Triton -

    Databases

    : Vector DBs (Pinecone, Weaviate), PostgreSQL, Redis

    Production Models:

    - Amazon Nova Sonic v1.0 (Speech-to-Speech streaming) - Gemini 2.5 Flash Native Audio Dialog (Multimodal processing) - Azure OpenAI GPT-4o (Realtime voice conversations) - ElevenLabs (Voice cloning and synthesis)

    Infrastructure

  • GPU Clusters : NVIDIA A100/H100 for model training
  • Edge Deployment : NVIDIA Jetson, ARM-based targets
  • Real-time Requirements : <300ms latency, 1000+ concurrent streams
  • Enterprise Integration : Genesys AudioConnector, SIP protocol, telephony systems
  • Key Projects & Success Metrics

    Primary Focus Areas

    • Next-gen S2S Systems : Amazon Nova Sonic, Azure OpenAI Realtime, Gemini Native Audio
    • Multi-cloud Integration : Unified APIs across AWS, Google Cloud, Azure
    • Conversational AI Agents : Low-latency speech-enabled customer service bots
    • Telecom Integration : Enterprise telephony and AudioConnector systems
    • Domain-specific Models : Medical, legal, technical vocabulary fine-tuning

    Success Metrics

    Performance : <5% WER for domain-specific tasks Latency :
  • Reliability : 99.9% uptime for production services
  • Scale : 1000+ concurrent speech streams
  • Mock Interview

    Practice Video Interview with JobPe AI

    Start Python Interview
    cta

    Start Your Job Search Today

    Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

    Job Application AI Bot

    Job Application AI Bot

    Apply to 20+ Portals in one click

    Download Now

    Download the Mobile App

    Instantly access job listings, apply easily, and track applications.

    coding practice

    Enhance Your Python Skills

    Practice Python coding challenges to boost your skills

    Start Practicing Python Now
    EXL logo
    EXL

    Business Process Management / Analytics

    New York

    RecommendedJobs for You

    gurgaon, haryana, india

    gurgaon, haryana, india

    noida, uttar pradesh, india

    noida, uttar pradesh, india

    noida, uttar pradesh, india