Job Purpose
We are seeking a dynamic AI/ML Engineer to join our pioneering voice Gen AI RD team. The ideal candidate will possess a strong foundation in machine learning and a passion for innovation. This role involves developing advanced voice AI solutions.
Duties and Responsibilities
Research and Innovation: Stay abreast of the latest advancements in Gen AI/ML technologies, contributing to research initiatives and applying innovative solutions to practical problems
Generative AI Model Optimization:
Fine-tune LLMs/SLMs with proprietary NBFC data
Perform distillation, quantization of LLMs for edge deployment
Evaluate and run LLM/SLM models on local/edge server machines
Conversational Intelligence:
Develop and fine-tune BOTs capable of negotiation using contextual understanding, emotion detection, and dynamic loan pitch logic
Build intelligent Dialogue Management frameworks that adapt in real-time
Speech Technology RD:
Evaluate Speech-to-Speech (S2S) models for natural voice responses
Assess STT models for indic dialects accuracy; explore emotion-aware TTS engines
Experiment with speaker diarization for multi-speaker environments
Voice Biometrics Security:
Collect and analyze voice samples for biometric model training
Evaluate biometric algorithms for fraud prevention and authentication
Implement anti-spoofing techniques to prevent deepfakes/recorded attacks
Ensure data privacy compliance in voice data usage
Self-Learning Frameworks:
Build self-learning systems that adapt without full retraining (e.g., learn new rejection patterns from calls)
Implement lightweight local models to enable real-time learning on the edge. Key Decisions / Dimensions Model Selection Customization
Choosing the right STT, TTS, and S2S models for various Indic languages and dialects
Deciding between open-source vs. commercial APIs based on latency, cost, and control
LLM/SLM Strategy
Selecting appropriate LLM/SLM architectures for dialogue management and negotiation logic
Deciding what to fine-tune, distill, or quantize, and what to leave generic
Edge vs. Cloud Architecture
Making trade-offs between on-device processing and cloud-based orchestration
Defining what runs locally for speed/privacy and what needs backend support
Emotion Dialogue Logic Integration
Mapping emotional cues to appropriate TTS responses and negotiation tone
Designing fallback logic for unrecognized or hostile user responses
Voice Biometrics Algorithm Evaluation
Choosing and testing biometric algorithms for authentication and anti-spoofing
Deciding thresholds for matching, rejection, and fraud escalation. Major Challenges Building a bot that doesn''t just answer but negotiates with human-like reasoning
Running large models (LLM/STT/TTS) in low-latency, low-bandwidth environments without cloud dependency
Understanding caller emotions in noisy, multilingual conditions (anger, hesitation, sarcasm)
Ensuring STT and TTS pipelines work well with dialect-rich, low-resource Indian languages
Preventing fraud via recorded calls or deepfake voices
Bot must learn from failed interactions
Required Qualifications and Experience
Educational Background: Bachelors or Masters degree in Computer Science, Engineering, or a related field
Experience: 28 years of experience in AI/ML, with exposure to Natural Language Processing (NLP) and speech technologies
Strong experience in Speech AI STT, TTS, S2S, speaker diarization, or related areas
Proficiency in LLMs/SLMs, Hugging Face, LangChain, or OpenAI stack
Experience with model optimization techniques (quantization, distillation)
Knowledge of edge AI deployment, low-latency serving
Understanding of emotion modeling, biometric systems, and anti-spoofing
Experience in Python, PyTorch/TensorFlow, and scalable deployment workflows
Bonus: Experience in Indian language dialects, voice data collection, or field deployments in semi-urban/rural settings
a)LLM Finetuning, Speech AI STT, TTS, S2S, speaker diarization