About the Role:
We are looking for a Data Scientist with 4–8 years of experience in developing Natural
Language Processing (NLP) and Generative AI (GenAI) solutions. The ideal candidate is
hands-on with the ability to rapidly research, design and build state-of-the-art prototypes for both
internal R&D and live customer projects. Experience with Databricks (especially MLOps
Stacks) is highly desirable.
Key Responsibilities:
- Translate business challenges into solvable NLP and GenAI use cases, such as
- document understanding, web search, automated Q&A, summarization, and workflow
- automation.
- Stay updated with the latest GenAI/LLM advancements and evaluate them for
- feasibility and potential use.
- Design, build, and deploy LLM-powered retrieval-augmented generation (RAG)
- pipelines and agentic AI solutions, including multi-step reasoning systems,
- tool-using agents, and associated pipelines.
- Build basic UI frontends (e.g., using Streamlit, Flask) for internal demos or
- client-facing pilot GenAI applications.
- Apply MLOps best practices including MLflow-based tracking, Docker
- containerization, and CI/CD for GenAI pipelines.
- Develop customer demos and prototypes using Databricks MosaicAI suite. ○
- Contribute to both internal R&D efforts and customer implementations, including
- rapid POCs and scalable production deployments.
Required Qualifications:
- 4–8 years of implementation experience in machine learning, with a strong focus on NLP
- and GenAI applications in a customer-facing role.
- Must have productionized machine learning or deep learning models.
- Familiarity with SQL and working with large, complex datasets.
- Proficiency in Python and NLP/LLM libraries/tools such as HuggingFace Transformers,
- LangChain, LangGraph, LlamaIndex, etc.
- Practical experience with prompt engineering, chunking, vector embeddings, semantic
- search, RAG pipelines, and LLM fine-tuning.
- Understanding of GenAI-specific challenges - hallucination, prompt security, rate limits,
- cost optimisation, etc.
Strong foundation in statistics, including:
- Model assumptions and diagnostics
- Evaluation metrics and error analysis
- Probabilistic modelling, hypothesis testing, and uncertainty quantification ○
- Feature importance and interpretability techniques
Experience in MLOps tools and processes, including:
- Model versioning and experiment tracking (e.g., MLflow)
- Containerization (Docker)
- CI/CD for ML workflows (e.g., GitHub Actions, Azure DevOps, or similar)
- Model monitoring and retraining workflows
- Desirable: Hands-on experience with Databricks for model development and
- deployment.
- Desirable: Familiarity with cloud environments and the native AI/ML-related
- tools/services (Azure, AWS, or GCP).
- Strong analytical and communication skills, with a demonstrated ability to convert
- business requirements into NLP/GenAI solutions.
Educational Background:
- Bachelor’s or Master’s degree in Computer Science, Data Science, Mathematics,
- Statistics, Operational Research, or a related quantitative discipline.
- Relevant certifications (e.g., Databricks certifications, AWS/Azure/GCP AI/ML
- certifications) are a plus.
- Workplace Flexibility
- This is a hybrid role with remote flexibility.
- On-site presence at customer locations will be required based on the project and
- business needs. Candidates should be willing and able to travel for short or
- medium-term assignments when necessary.