3 - 7 years

0 Lacs

Posted:3 weeks ago| Platform: Shine logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

As an Ops Engineer at Gainwell, you will play a crucial role in the AI/ML team by developing, deploying, and maintaining scalable infrastructure and pipelines for Machine Learning models and Large Language Models (LLMs). Your responsibilities will include collaborating closely with Data Scientists and DevOps to ensure smooth model lifecycle management, performance monitoring, version control, and compliance. You will have core responsibilities for LLM Ops and ML Ops: **Key Responsibilities:** **Core LLM Ops Responsibilities:** - Develop and manage scalable deployment strategies tailored for LLMs (GPT, Llama, Claude, etc.). - Optimize LLM inference performance, including model parallelization, quantization, pruning, and fine-tuning pipelines. - Integrate prompt management, version control, and retrieval-augmented generation (RAG) pipelines. - Manage vector databases, embedding stores, and document stores used with LLMs. - Monitor hallucination rates, token usage, and cost optimization for LLM APIs or on-prem deployments. - Continuously monitor models" performance and ensure alert systems are in place. - Ensure compliance with ethical AI practices, privacy regulations, and responsible AI guidelines. **Core ML Ops Responsibilities:** - Design, build, and maintain robust CI/CD pipelines for ML model training, validation, deployment, and monitoring. - Implement version control, model registry, and reproducibility strategies for ML models. - Automate data ingestion, feature engineering, and model retraining workflows. - Monitor model performance, drift, and ensure proper alerting systems. - Implement security, compliance, and governance protocols for model deployment. - Collaborate with Data Scientists to streamline model development and experimentation. **Qualification Required:** - Bachelor's or Master's degree in Computer Science, Data Sciences-Machine Learning, Engineering, or related fields. - Strong experience with ML Ops tools (Kubeflow, ML flow, TFX, Sage Maker, etc.). - Experience with LLM-specific tools and frameworks. - Proficient in deploying models in cloud (AWS, Azure, GCP) and on-prem environments. - Strong coding skills in Python, Bash, and familiarity with infrastructure-as-code tools. - Knowledge of healthcare AI applications and regulatory compliance.,

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
Gainwell Technologies logo
Gainwell Technologies

Information Technology and Services

Los Angeles

RecommendedJobs for You