Job Overview:
The Senior Lead Data Scientist is a senior technical leader responsible for designing, developing advanced data and AI solutions, with a strategic focus on leveraging Generative AI technologies (e.g., large language models, diffusion models, multi-modal systems) to solve complex business problems. This role combines deep expertise in data science, machine learning, and AI architecture with a strong understanding of product strategy, cross-functional leadership, and ethical AI deployment.
Key Responsibilities:
-
Design and implement advanced solutions utilizing Large Language Models (LLMs).
-
Demonstrate self-driven initiative by taking ownership and creating end-to-end solutions.
-
Conduct research and stay informed about the latest developments in generative AI and LLMs.
-
Develop and maintain code libraries, tools, and frameworks to support generative AI development.
-
Participate in code reviews and contribute to maintaining high code quality standards.
-
Possess strong analytical and problem-solving skills.
-
Demonstrate excellent communication skills and the ability to work effectively in a team environment.
Skills & Qualifications:
Must Have Skills:
-
7 to 12 years of experience in IT
-
Natural Language Processing (NLP): Hands-on experience in use case classification, topic modeling, Q&A and chatbots, search, Document AI, summarization, and content generation.
-
Computer Vision and Audio: Hands-on experience in image classification, object detection, segmentation, image generation, audio, and video analysis.
-
Generative AI: Proficiency with SaaS LLMs, including Lang chain, llama index, vector databases, Prompt engineering (COT, TOT, ReAct, agents). Experience with Azure OpenAI, Google Vertex AI, AWS Bedrock for text/audio/image/video modalities.
-
Familiarity with Open-source LLMs, including tools like TensorFlow/Pytorch and huggingface. Techniques such as RLHF.
-
Cloud: Hands-on experience with cloud platforms such as Azure, AWS, and GCP.
-
Application Development: Proficiency in Python, Docker, FastAPI/Django/Flask, and Git.
Tech Skills :
Machine Learning (ML) & Deep Learning
-
Solid understanding of supervised and unsupervised learning.
-
Proficiency with deep learning architectures like Transformers, LSTMs, RNNs, etc.
Generative AI:
-
Hands-on experience with models such as OpenAI, Gemini etc.
-
Knowledge of optimizing large language models (LLMs) for specific tasks.
Natural Language Processing (NLP):
-
Expertise in NLP techniques, including text preprocessing, tokenization, embeddings, and sentiment analysis.
-
Familiarity with NLP tasks such as text classification, summarization, translation, and question-answering.
Retrieval-Augmented Generation (RAG):
-
In-depth understanding of RAG pipelines, including knowledge retrieval techniques like dense/sparse retrieval.
-
Experience integrating generative models with external knowledge bases or databases to augment responses.
Search and Retrieval Systems:
-
Experience with building or integrating search and retrieval systems, leveraging knowledge of Elasticsearch, AI Search, ChromaDB etc.
Prompt Engineering:
-
Expertise in crafting, fine-tuning, and optimizing prompts to improve model output quality and ensure desired results.
-
Understanding how to guide large language models (LLMs) to achieve specific outcomes by using different prompt formats, strategies, and constraints.
-
Knowledge of techniques like few-shot, zero-shot, and one-shot prompting, as well as using system and user prompts for enhanced model performance.
Programming & Libraries:
-
Proficiency in Python and libraries such as PyTorch, Hugging Face, etc.
-
Knowledge of version control (Git), cloud platforms (AWS, GCP, Azure).
APIs & Integration:
Evaluation & Benchmarking:
Good to Have Skills:
-
Advanced Degree: Master s degree in computer science or relevant field.
-
Life Sciences Experience: Experience in Life sciences/Healthcare Industry
-
Azure Certification: Azure Cloud experience/certification.
-
Experience with Multi-modal AI models (text-to-image, text-to-video, speech synthesis, etc.).
-
Knowledge of Knowledge Graphs and Symbolic AI .
-
Understanding of MLOps and LLMOps for deploying scalable AI solutions.