Gurugram, Haryana, India
None Not disclosed
On-site
Full Time
To Apply - Fill this form: https://lnkd.in/gbveTe5r We’re looking for a Data Scientist specializing in NLP and conversational AI to join our core team working at the frontier of real-time voice intelligence. You'll build systems that combine intent detection, generative reasoning, and speech/audio insights to power responsive and persuasive AI callbots. This role is ideal for someone who thrives in applied NLP and voice tech, and who enjoys the challenges of building scalable AI systems that feel natural and human-like. Job Title: Data Scientist – NLP & Conversational AI Location: Gurgram (On Site) Experience: 2+ years Working Hours: Full time Key Responsibilities Build and refine models for intent recognition, speech act classification, and response routing Design lightweight NLP classifiers and heuristics to optimize LLM usage Improve conversational dynamics using turn-taking prediction, pause detection, and filler word modeling Create classifiers to detect IVRs, gatekeepers, voicemail systems, and target respondents Build or fine-tune small models for emotion detection, dialogue state tracking, or speaker intent Work on context-aware response selection systems (e.g., RAG pipelines with pre-recorded replies) Collaborate closely with engineering teams to deploy NLP models in real-time inference systems Evaluate model performance across latency, accuracy, fallback behavior, and human-likeness Core Skills NLP frameworks: HuggingFace Transformers, spaCy, NLTK, fastText, Sentence Transformers ML libraries: scikit-learn, PyTorch, TensorFlow, LightGBM Audio signal processing: MFCCs, VAD, filler detection, silence segmentation Real-time or low-latency model design experience (quantization, distillation, pruning) RAG: Retrieval-Augmented Generation pipelines using vector stores (FAISS, Chroma, Pinecone) Understanding of dialog state management, dialogue act tagging, and conversational UX design Preferred / Bonus Skills Familiarity with Whisper, Coqui, Bark, or other open-source STT/TTS models Prompt engineering or LLM optimization (OpenAI, Claude, LLaMA, Mistral) Experience with streaming inference architectures or edge AI (e.g., LLaMA.cpp) Exposure to hybrid response generation: LLM + Pre-recorded audio systems Use of experiment tracking tools like MLflow or Weights & Biases General Qualities We Value Comfort working in fast-paced, ambiguous environments Startup or early-stage product experience Strong applied portfolio: GitHub, notebooks, demos, or Kaggle/NLP competition track record Eagerness to build production-ready, real-time ML features Curiosity, creativity, and a collaborative mindset
Gurugram, Haryana, India
None Not disclosed
On-site
Full Time
To Apply Please fill the form: https://lnkd.in/g4KwqECF Company Description NovaIA offers an AI-powered voice assistant tool designed to support human agents in real time. Particularly tailored for real estate agencies, the assistant can make calls, follow up with leads, filter prospects, and schedule appointments. Key features include real-time agent support and appointment management automation. The assistant listens in on conversations, providing live guidance, data, or suggestions, and seamlessly handles follow-ups and meeting setups through voice interactions. Role Description This is a full-time Data Engineer role located on-site in Gurugram. We’re looking for a Data Engineer to design, implement, and scale the data pipelines that power our real-time, voice-driven AI experiences . In this role, you’ll work with large volumes of structured and unstructured data, enabling ultra-low-latency processing across speech-to-text (STT) , natural language processing (NLP) , and text-to-speech (TTS) modules. You’ll partner closely with machine learning engineers, product managers, and DevOps teams to ensure our data infrastructure is fast, reliable, and production-ready , directly shaping how our AI interacts with users in real time. Key Responsibilites: Design & Implement Pipelines – Build robust, low-latency pipelines for real-time STT input, NLP processing, and TTS output. Ingestion Systems – Develop scalable ingestion for audio logs, model artifacts, and interaction metadata. Stream Management – Manage message queues and streaming data for efficient voice call routing and real-time responses. Caching & Prefetching – Optimize caching layers and prefetching logic for pre-recorded response fragments. ETL/ELT Workflows – Create workflows for downstream analytics, monitoring, and continuous feedback loops. Session Memory – Develop and manage session memory stores for dynamic context handling. Data Governance – Ensure data versioning, schema consistency, and lineage tracking. Cost Optimization – Collaborate on token usage optimization and infrastructure cost reporting. Core Skills Data Pipeline Orchestration : Apache Airflow, Prefect, Luigi, dbt Stream Processing : Kafka, Apache Flink, Redis Streams, RabbitMQ Programming : Python, SQL; familiarity with Java/Scala is a plus Cloud Platforms : AWS (Kinesis, S3, Lambda), GCP (Pub/Sub, BigQuery), or Azure equivalents Storage Systems : PostgreSQL, DynamoDB, Parquet, Snowflake, Delta Lake Data Quality & Observability : Schema validation, Great Expectations, monitoring tools Audio Data Handling : Experience with transcription logs, metadata tagging, media storage Version Control & CI/CD for Data : Git, DVC, automated testing workflows Preferred / Bonus Skills Familiarity with ML model pipelines and experiment tracking Experience with real-time ETL optimization and low-latency microservices Knowledge of vector databases (FAISS, Chroma, Pinecone) Experience with WebRTC, SIP, or other real-time audio systems Understanding of data governance and compliance (PII masking, audit trails)
haryana
INR Not disclosed
On-site
Full Time
You are looking for a Data Scientist specializing in NLP and conversational AI to join a core team that focuses on real-time voice intelligence. In this role, you will be responsible for building systems that involve intent detection, generative reasoning, and speech/audio insights to develop responsive and persuasive AI callbots. This position is suited for individuals who excel in applied NLP and voice technology and are passionate about creating scalable AI systems that are natural and human-like. As a Data Scientist specializing in NLP & Conversational AI, you will be based in Gurugram and should have at least 2 years of experience. This is a full-time position where you will work collaboratively with the engineering teams to deploy NLP models in real-time inference systems. Your responsibilities will include building and refining models for intent recognition, speech act classification, and response routing. Additionally, you will design lightweight NLP classifiers and heuristics to optimize LLM usage, and enhance conversational dynamics using various techniques such as turn-taking prediction and pause detection. To excel in this role, you should have proficiency in NLP frameworks such as HuggingFace Transformers, spaCy, NLTK, fastText, and Sentence Transformers. Familiarity with ML libraries like scikit-learn, PyTorch, TensorFlow, and LightGBM is essential. Experience in audio signal processing techniques like MFCCs, VAD, filler detection, and silence segmentation would be beneficial. Knowledge of real-time or low-latency model design, RAG pipelines, dialog state management, dialogue act tagging, and conversational UX design is also required. Preferred or bonus skills that would be advantageous include familiarity with Whisper, Coqui, Bark, or other open-source STT/TTS models, prompt engineering, LLM optimization, and experience with streaming inference architectures or edge AI. Exposure to hybrid response generation systems and the use of experiment tracking tools like MLflow or Weights & Biases would be a plus. The ideal candidate should be comfortable working in fast-paced and ambiguous environments, possess startup or early-stage product experience, and showcase a strong applied portfolio with GitHub repositories, notebooks, demos, or a track record in Kaggle/NLP competitions. Additionally, having an eagerness to develop production-ready, real-time ML features along with qualities like curiosity, creativity, and a collaborative mindset are highly valued for this role.,
Gurugram, Haryana, India
None Not disclosed
On-site
Full Time
Company Description NovaIA offers an AI-powered voice assistant tool designed to support human agents in real time. Particularly tailored for real estate agencies, the assistant can make calls, follow up with leads, filter prospects, and schedule appointments. Key features include real-time agent support and appointment management automation. The assistant listens in on conversations, providing live guidance, data, or suggestions, and seamlessly handles follow-ups and meeting setups through voice interactions. We’re looking for a versatile and hands-on Data Scientist who can bridge the gap between traditional machine learning and conversational AI. You'll work on predictive modeling tasks (e.g., user behavior, conversion forecasting) and also contribute to intelligent voicebots that respond in real-time. This role offers a balance of experimentation, productionization, and product collaboration—ideal for someone who thrives at the intersection of models and applications. Job Title: Data Scientist – Predictive Modeling & Conversational AI Location: Gurgram (On Site) Experience: 3+ years Working Hours: Full time Key Responsibilities Design, build, and evaluate machine learning models for classification, regression, clustering, and ranking use cases Analyze large datasets to extract insights, train predictive models, and improve decision-making Lead and support analytics use cases such as behavior prediction, engagement scoring, and feature engineering Work on NLP/NLU tasks including intent recognition, entity extraction, summarization, and semantic similarity Contribute to conversational AI logic such as dynamic routing, fallback response logic, and session personalization Design evaluation frameworks to assess real-time model performance in voice-based interfaces Collaborate with engineering teams to deploy models in production environments and monitor model health Core Skills ML/DS toolkits: scikit-learn, XGBoost, LightGBM, CatBoost, PyCaret Data wrangling: pandas, NumPy, Polars, SQL (PostgreSQL, BigQuery) NLP frameworks: HuggingFace Transformers, spaCy, NLTK, fastText ML ops understanding: model versioning, performance monitoring, feature store design Working with structured + unstructured data (voice/text/logs) Comfortable writing modular, reusable code in Python or notebooks with best practices Preferred / Bonus Skills LLM integration: prompt engineering, fine-tuning open-source models (e.g., Mistral, LLaMA) Time series forecasting (Prophet, ARIMA, or ML-based) Recommender systems or ranking algorithms (collaborative filtering, hybrid models) Familiarity with RAG pipelines, embeddings, vector search, and hybrid retrieval Experience using experiment tracking tools (MLflow, Weights & Biases, DVC) Exposure to speech/audio data analytics General Qualities We Value Comfort working in fast-paced, ambiguous environments Startup or early product-building experience with cross-functional teams Strong problem-solving ability and interest in building user-facing intelligence Demonstrated portfolio of work (e.g., GitHub, notebooks, blog posts, Kaggle) Curiosity, autonomy, and eagerness to contribute across the stack when needed Note: If Question is Not Applicable: Write NA
Gurugram, Haryana, India
None Not disclosed
On-site
Full Time
Company Description NovaIA offers an AI-powered voice assistant tool designed to support human agents in real time. Particularly tailored for real estate agencies, the assistant can make calls, follow up with leads, filter prospects, and schedule appointments. Key features include real-time agent support and appointment management automation. The assistant listens in on conversations, providing live guidance, data, or suggestions, and seamlessly handles follow-ups and meeting setups through voice interactions. We're hiring a Data Engineer to design, implement, and scale robust data pipelines that power our real-time voice-based AI systems. This role involves working with large volumes of structured and unstructured data, enabling low-latency processing across speech-to-text (STT), natural language processing (NLP), and text-to-speech (TTS) modules. You’ll collaborate closely with machine learning engineers, product teams, and DevOps to ensure data availability, reliability, and performance in production environments. Job Title: Data Engineer – Real-Time & ML Pipelines Location: Gurgram (On Site) Experience: 3+ years Working Hours: Full time Key Responsibilities Design and implement data pipelines for real-time STT input, NLP processing, and TTS output Build scalable ingestion systems for audio logs, model artifacts, and interaction metadata Manage message queues and streaming data for efficient voice call routing and response Optimize caching layers and prefetching logic for pre-recorded response fragments Create ETL/ELT workflows for downstream analytics, monitoring, and feedback loops Develop and manage session memory stores for dynamic context handling Ensure data versioning, schema consistency, and lineage tracking Collaborate on token usage optimization and infrastructure cost reporting Core Skills Data pipeline orchestration: Kubernetes Stream processing: Kafka, Apache Flink, Redis Streams, RabbitMQ Programming: Python, SQL; familiarity with Java/Scala is a plus Cloud-native architecture: AWS (Kinesis, S3, Lambda), GCP (Pub/Sub, BigQuery), or Azure equivalents Storage systems: PostgreSQL, DynamoDB, Parquet, Snowflake, Delta Lake Data quality, schema validation, and observability tools Experience working with audio data (transcription logs, metadata tagging, media storage) Version control & CI/CD for data (DVC, Great Expectations, Git) Preferred / Bonus Skills Familiarity with ML model pipelines and experiment tracking Real-time ETL optimization and low-latency microservices Knowledge of vector databases (e.g., FAISS, Chroma, Pinecone) Experience with WebRTC, SIP, or real-time audio systems Data governance and compliance (PII masking, audit trails) General Qualities We Value Comfort working in fast-paced, ambiguous environments Startup or zero-to-one product experience A strong portfolio, GitHub contributions, or project demos Willingness to collaborate closely with founders and cross-functional teams Curiosity, creativity, and ability to learn quickly Note: If Question is Not Applicable: Write NA Note: If Question is Not Applicable: Write NA
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.