Data Scientist - GenAI Applications
Position Overview
We are seeking an experienced Data Scientist with 6+ years of experience to join our GenAI Applications team. This role leverages core data science expertisestatistical modeling, machine learning, and data analysis—to enhance and optimize generative AI solutions. The ideal candidate will bring strong foundational data science skills and apply them to the emerging field of generative AI.
Key Responsibilities
Statistical Analysis & Machine Learning
- Conduct statistical analysis and hypothesis testing to validate model performance and business impact
- Build and optimize traditional machine learning models (regression, classification, clustering, time series, etc.)
- Perform feature engineering, selection, and dimensionality reduction techniques
- Design and execute A/B tests and controlled experiments to measure model effectiveness
- Develop predictive models and recommendation systems to support business decision-making
Data Analysis & Insights
- Analyze large, complex datasets to extract actionable insights and identify patterns
- Create comprehensive data visualizations and dashboards to communicate findings to stakeholders
- Perform exploratory data analysis (EDA) to understand data distributions, correlations, and anomalies
- Conduct cohort analysis, funnel analysis, and other business intelligence techniques
- Develop KPIs and metrics frameworks to measure success of AI initiatives
GenAI Model Development & Implementation
- Apply data science methodologies to improve generative AI model performance and reliability
- Design experiments to evaluate and compare different generative AI approaches
- Implement fine-tuning strategies for pre-trained models using statistical optimization techniques
- Develop RAG (Retrieval-Augmented Generation) systems with focus on data quality and retrieval accuracy
- Create evaluation frameworks and metrics specifically for generative AI applications
Research & Innovation
- Stay current with latest developments in generative AI, including emerging architectures and techniques
- Experiment with novel approaches to improve model performance, efficiency, and safety
- Collaborate with research teams to translate cutting-edge research into practical applications
- Evaluate and benchmark different model architectures and frameworks
Cross-functional Collaboration
- Partner with product managers to translate business requirements into technical solutions
- Work closely with MLOps engineers to ensure scalable deployment and monitoring
- Collaborate with software engineers to integrate AI models into production applications
- Communicate complex technical concepts to stakeholders across various technical backgrounds
Required Qualifications
Technical Skills
- Bachelors, Master or Ph.D in Computer Science, Data Science, Statistics, Mathematics, or related field
- 6+ years of experience in data science with strong foundation in statistical methods and machine learning
- Expert proficiency in Python/R and data science libraries (pandas, numpy, scikit-learn, statsmodels, matplotlib, seaborn)
- Strong knowledge of statistical modeling, regression analysis, and experimental design
- Experience with SQL and database management for large-scale data analysis
- Proficiency in data visualization tools (Tableau, Power BI, or similar)
- Knowledge of machine learning frameworks (PyTorch, TensorFlow, XGBoost, LightGBM)
- Experience with cloud platforms (AWS, GCP, Azure) and distributed computing
- MLOps proficiency with tools like MLflow, Kubeflow, Docker, and CI/CD pipelines
- Familiarity with cloud ML platforms such as AWS SageMaker, Google Vertex AI, or Azure ML
- Strong understanding of data preprocessing, feature engineering, and model validation techniques
GenAI & Advanced ML
- Hands-on experience applying data science techniques to natural language processing tasks
- Knowledge of deep learning architectures, particularly transformer models
- Experience with model evaluation metrics and statistical significance testing
- Understanding of bias detection, model interpretability, and responsible AI practices
- Familiarity with large language models and generative AI evaluation methodologies
Soft Skills
- Strong analytical and problem-solving abilities
- Excellent communication skills with ability to explain complex technical concepts
- Experience working in agile development environments
- Self-motivated with ability to work independently and in team settings
Preferred Qualifications
- Experience with advanced statistical methods (Bayesian analysis, causal inference, time series forecasting)
- Knowledge of optimization algorithms and mathematical programming
- Experience with multivariate testing and experimental design
- Background in natural language processing or computer vision from a data science perspective
- Experience with model compression and efficiency optimization techniques
- Experience leading data science projects and mentoring junior data scientists
- Data engineering experience including pipeline development, data preprocessing, and model deployment workflows