Lead Assistant Manager

2 years

5 - 10 Lacs

Posted:12 hours ago| Platform: GlassDoor logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Lead Assistant Manager

EXL/LAM/1465683

    Digital SolutionsNoida
    Posted On
    29 Aug 2025
    End Date
    13 Oct 2025
    Required Experience
    2 - 5 Years

Basic Section

Number Of Positions

1

Band

B2

Band Name

Lead Assistant Manager

Cost Code

D014959

Campus/Non Campus

NON CAMPUS

Employment Type

Permanent

Requisition Type

New

Max CTC

1000000.0000 - 2500000.0000

Complexity Level

Not Applicable

Work Type

Hybrid – Working Partly From Home And Partly From Office

Organisational

Group

EXL Digital

Sub Group

Digital Solutions

Organization

Digital Solutions

LOB

CX(CONNECx)

SBU

Exelia.AI

Country

India

City

Noida

Center

Noida - Centre 59

Skills

Skill

PYTHON PROGRAMMING

NATURAL LANGUAGE PROCESSING - NLP

GCP

Minimum Qualification

BTECH

Certification

No data available

Job Description

Senior ASR/TTS Specialist - AI Agent Integration Expert

Company: EXL Service
Type: Full-time
Experience: 3+ years

Position Summary

We seek an exceptional Senior ASR/TTS Specialist to lead speech AI initiatives and integrate advanced speech technologies with AI agent frameworks. This role focuses on fine-tuning ASR/TTS models, implementing MLOps best practices, and building production-ready speech AI systems powering next-generation conversational AI agents.

Key Responsibilities

Speech AI Model Development & Integration

  • Model Fine-tuning: Customize state-of-the-art ASR/TTS models for domain-specific applications with <300ms latency
  • Speech-to-Speech Systems: Build end-to-end S2S pipelines using Amazon Nova Sonic v1.0, Azure OpenAI Realtime (GPT-4o), and Gemini 2.5 Flash Native Audio
  • Multi-modal Integration: Develop speech models integrating with vision and text modalities in AI agents
  • Agent Framework Integration: Implement speech capabilities with LangChain/LangGraph, CrewAI, AutoGen, LlamaIndex, and OpenAI Assistants API

MLOps & Production Engineering

  • Model Lifecycle: Implement comprehensive MLOps pipelines using MLflow, Weights & Biases, and automated CI/CD
  • Multi-cloud Deployment: Deploy speech models across AWS Bedrock, Google Cloud AI, and Azure Cognitive Services
  • Real-time Processing: Build WebSocket-based streaming audio systems handling 1000+ concurrent connections
  • Production Monitoring: Implement WER tracking, latency monitoring, and multi-provider failover mechanisms

Research & Development

  • Cutting-edge Research: Stay current with latest speech AI breakthroughs and implement novel architectures
  • Performance Optimization: Optimize models for real-time inference using TensorRT, ONNX, and edge deployment
  • Data Pipeline Engineering: Build scalable audio ingestion, preprocessing, and augmentation systems

Required Qualifications

Core Technical Skills (Must-Have)

Speech AI Models (3+ years experience): - ASR Systems: Amazon Nova Sonic v1.0, Google Speech-to-Text, Azure Speech Services, Whisper, Wav2Vec2, Riva - TTS Systems: Google TTS, Azure Cognitive Services TTS, ElevenLabs (REST/WebSocket), Tortoise, VITS, FastSpeech2 - Speech-to-Speech: Direct S2S without intermediate text, multimodal audio processing - Cloud Services: AWS Bedrock Runtime, Google Cloud AI (Gemini API), Azure OpenAI Services

Programming & Frameworks: - Languages: Expert Python, proficient C++/Rust for optimization - ML Frameworks: Advanced PyTorch, TensorFlow 2.x, JAX/Flax - Audio Processing: librosa, torchaudio, soundfile, WebRTC, µ-law/PCM conversion - Agent Frameworks: Hands-on experience with 3+ of: LangChain, CrewAI, AutoGen, LlamaIndex, OpenAI Assistants

MLOps & Infrastructure (Essential)

MLOps Tools (2+ years): - Experiment Management: MLflow, Weights & Biases - Model Serving: TorchServe, TensorFlow Serving, NVIDIA Triton - Workflow Orchestration: Apache Airflow, Kubeflow, Prefect - Containerization: Docker, Kubernetes for speech model deployment

Cloud & Production: - Multi-cloud Experience: AWS (Bedrock, Nova Sonic), Google Cloud (Gemini, Speech APIs), Azure (OpenAI Services) - Real-time Systems: Sub-300ms latency, WebSocket architecture, telecom integration (Genesys AudioConnector) - Monitoring: Audio quality metrics, model drift detection, production reliability (99.9% uptime)

Preferred Qualifications

Advanced Specializations

  • Multi-lingual Processing: Cross-lingual transfer learning, zero-shot adaptation
  • Domain Expertise: Healthcare, legal, technical domain speech AI
  • Edge AI: TensorRT, Core ML, ONNX optimization for mobile/edge deployment
  • Research Background: Publications in ICASSP, INTERSPEECH, ICML, NeurIPS

Leadership & Education

  • Team Leadership: Experience leading speech AI teams and technical initiatives
  • Education: MS/PhD in Computer Science, Electrical Engineering, or related field
  • Open Source: Contributions to speech AI libraries and frameworks

Technical Environment

Production Technology Stack

Core Technologies: - Languages: Python, C++, Rust, TypeScript - Frameworks: PyTorch, TensorFlow, JAX, LangChain, CrewAI, AutoGen - Cloud Services: AWS Bedrock, Google Cloud AI, Azure OpenAI Services - Audio Tools: librosa, torchaudio, WebRTC, FFmpeg - MLOps: MLflow, Kubeflow, Docker, Kubernetes, NVIDIA Triton - Databases: Vector DBs (Pinecone, Weaviate), PostgreSQL, Redis

Production Models: - Amazon Nova Sonic v1.0 (Speech-to-Speech streaming) - Gemini 2.5 Flash Native Audio Dialog (Multimodal processing) - Azure OpenAI GPT-4o (Realtime voice conversations) - ElevenLabs (Voice cloning and synthesis)

Infrastructure

  • GPU Clusters: NVIDIA A100/H100 for model training
  • Edge Deployment: NVIDIA Jetson, ARM-based targets
  • Real-time Requirements: <300ms latency, 1000+ concurrent streams
  • Enterprise Integration: Genesys AudioConnector, SIP protocol, telephony systems

Key Projects & Success Metrics

Primary Focus Areas

  • Next-gen S2S Systems: Amazon Nova Sonic, Azure OpenAI Realtime, Gemini Native Audio
  • Multi-cloud Integration: Unified APIs across AWS, Google Cloud, Azure
  • Conversational AI Agents: Low-latency speech-enabled customer service bots
  • Telecom Integration: Enterprise telephony and AudioConnector systems
  • Domain-specific Models: Medical, legal, technical vocabulary fine-tuning

Success Metrics

  • Performance: <5% WER for domain-specific tasks
  • Latency: <300ms end-to-end processing
  • Reliability: 99.9% uptime for production services
  • Scale: 1000+ concurrent speech streams


Workflow

Workflow Type

Digital Solution Center

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You

gurgaon, haryana, india

gurgaon, haryana, india

pune, maharashtra, india