At Umami Bioworks, we are a leading bioplatform for the development and production of sustainable planetary biosolutions. Through the synthesis of machine learning, multi- omics biomarkers, and digital twins, UMAMI has established market-leading capability for discovery and development of cultivated bioproducts that can seamlessly transition to manufacturing with UMAMI’s modular, automated, plug-and-play production solutionBy partnering with market leaders as their biomanufacturing solution provider, UMAMI is democratizing access to sustainable blue bioeconomy solutions that address a wide range of global challenges.We’re a venture-backed biotech startup located in Singapore where some of the world’s smartest, most passionate people are pioneering a sustainable food future that is attractive and accessible to people around the world. We are united by our collective drive to ask tough questions, take on challenging problems, and apply cutting-edge science and engineering to create a better future for humanity. At Umami Bioworks, you will be encouraged to dream big and will have the freedom to create, invent, and do the best, most impactful work of your career.Umami Bioworks is looking to hire an inquisitive, innovative, and independent
Machine Learning Engineer
to join our R&D team in
Bangalore, India,
to develop scalable, modular ML infrastructure integrating predictive and optimization models across biological and product domains.The role focuses on orchestrating models for media formulation, bioprocess tuning, metabolic modeling, and sensory analysis to drive data-informed R&D.The ideal candidate combines strong software engineering skills with multi-model system experience, collaborating closely with researchers to abstract biological complexity and enhance predictive accuracy.
Responsibilities
- Design and build the overall architecture for a multi-model ML system that integrates distinct models (e.g., media prediction, bioprocess optimization, sensory profile, GEM-based outputs) into a unified decision pipeline
- Develop robust interfaces between sub-models to enable modularity, information flow, and cross-validation across stages (e.g., outputs of one model feeding into another)
- Implement model orchestration logic to allow conditional routing, fallback mechanisms, and ensemble strategies across different models
- Build and maintain pipelines for training, testing, and deploying multiple models across different data domains
- Optimize inference efficiency and reproducibility by designing clean APIs and containerized deployments
- Translate conceptual product flow into technical architecture diagrams, integration roadmaps, and modular codebases
- Implement model monitoring and versioning infrastructure to track performance drift, flag outliers, and allow comparison across iterations
- Collaborate with data engineers and researchers to abstract away biological complexity and ensure a smooth ML-only engineering focus
- Lead efforts to refactor and scale ML infrastructure for future integrations (e.g., generative layers, reinforcement learning modules)
Qualifications
- Bachelor’s or Master’s degree in Computer Science, Machine Learning, Computational Biology, Data Science, or a related field
- Proven experience developing and deploying multi-model machine learning systems in a scientific or numerical domain
- Exposure to hybrid modeling approaches and/or reinforcement learning strategies
Experience
- Experience with multi-model systems
- Worked with numerical/scientific datasets (multi-modal datasets)
- Hybrid modelling and/or RL (AI systems)
Core Technical Skills
- Machine Learning Frameworks: PyTorch, TensorFlow, scikit-learn, XGBoost, CatBoost
- Model Orchestration: MLflow, Prefect, Airflow
- Multi-model Systems: Ensemble learning, model stacking, conditional pipelines
- Reinforcement Learning: RLlib, Stable-Baselines3
- Optimization Libraries: Optuna, Hyperopt, GPyOpt
- Numerical & Scientific Computing: NumPy, SciPy, panda
- Containerization & Deployment: Docker, FastAPI
- Workflow Management: Snakemake, Nextflow
- ETL & Data Pipelines: pandas pipelines, PySpark
- Data Versioning: Git
- API Design for modular ML blocks
You will work directly with other members of our small but growing team to do cutting-edge science and will have the autonomy to test new ideas and identify better ways to do things.