Job
Description
Job Purpose
We are seeking a dynamic AI/ML Engineer to join our pioneering voice Gen AI R&D team. The ideal candidate will possess a strong foundation in machine learning and a passion for innovation. This role involves developing advanced voice AI solutions.Duties and Responsibilities
Research and Innovation: Stay abreast of the latest advancements in Gen AI/ML technologies, contributing to research initiatives and applying innovative solutions to practical problems.Generative AI & Model Optimization:
Fine-tune LLMs/SLMs with proprietary NBFC data.
Perform distillation, quantization of LLMs for edge deployment.
Evaluate and run LLM/SLM models on local/edge server machines.
Conversational Intelligence:Develop and fine-tune BOTs capable of negotiation using contextual understanding, emotion detection, and dynamic loan pitch logic.
Build intelligent Dialogue Management frameworks that adapt in real-time.
Speech Technology R&D:Evaluate Speech-to-Speech (S2S) models for natural voice responses.
Assess STT models for indic dialects & accuracy; explore emotion-aware TTS engines.
Experiment with speaker diarization for multi-speaker environments.
Voice Biometrics & Security:Collect and analyze voice samples for biometric model training.
Evaluate biometric algorithms for fraud prevention and authentication.
Implement anti-spoofing techniques to prevent deepfakes/recorded attacks.
Ensure data privacy compliance in voice data usage.
Self-Learning Frameworks:Build self-learning systems that adapt without full retraining (e.g., learn new rejection patterns from calls).
Implement lightweight local models to enable real-time learning on the edge.
Key Decisions / Dimensions
Model Selection & CustomizationChoosing the right STT, TTS, and S2S models for various Indic languages and dialects.
Deciding between open-source vs. commercial APIs based on latency, cost, and control.
LLM/SLM Strategy
Selecting appropriate LLM/SLM architectures for dialogue management and negotiation logic.
Deciding what to fine-tune, distill, or quantize, and what to leave generic.
Edge vs. Cloud Architecture
Making trade-offs between on-device processing and cloud-based orchestration.
Defining what runs locally for speed/privacy and what needs backend support.
Emotion & Dialogue Logic Integration
Mapping emotional cues to appropriate TTS responses and negotiation tone.
Designing fallback logic for unrecognized or hostile user responses.
Voice Biometrics Algorithm Evaluation
Choosing and testing biometric algorithms for authentication and anti-spoofing.
Deciding thresholds for matching, rejection, and fraud escalation
Major Challenges
Building a bot that doesn't just answer but negotiates with human-like reasoning.Running large models (LLM/STT/TTS) in low-latency, low-bandwidth environments without cloud dependency.Understanding caller emotions in noisy, multilingual conditions (anger, hesitation, sarcasm).Ensuring STT and TTS pipelines work well with dialect-rich, low-resource Indian languages.Preventing fraud via recorded calls or deepfake voices.Bot must learn from failed interactionsRequired Qualifications and Experience
Educational Background: Bachelors or Masters degree in Computer Science, Engineering, or a related field.
Experience: 28 years of experience in AI/ML, with exposure to Natural Language Processing (NLP) and speech technologies.
Strong experience in Speech AI STT, TTS, S2S, speaker diarization, or related areas.
Proficiency in LLMs/SLMs, Hugging Face, LangChain, or OpenAI stack.
Experience with model optimization techniques (quantization, distillation).
Knowledge of edge AI deployment, low-latency serving.
Understanding of emotion modeling, biometric systems, and anti-spoofing.
Experience in Python, PyTorch/TensorFlow, and scalable deployment workflows.
Bonus: Experience in Indian language dialects, voice data collection, or field deployments in semi-urban/rural settings.
a)LLM Finetuning, Speech AI STT, TTS, S2S, speaker diarization