ML Ops Engineer 4 - GCP [T500-20226]

0 years

0 Lacs

Posted:2 days ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

About Costco Wholesale

Costco Wholesale is a multi-billion-dollar global retailer with warehouse club operations in eleven countries. They provide a wide selection of quality merchandise, plus the convenience of specialty departments and exclusive member services, all designed to make shopping a pleasurable experience for their members.

About Costco Wholesale India

At Costco Wholesale India, we foster a collaborative space, working to support Costco Wholesale in developing innovative solutions that improve members’ experiences and make employees’ jobs easier. Our employees play a key role in driving and delivering innovation to establish IT as a core competitive advantage for Costco Wholesale.


Position Title: ML Ops Engineer 4

Roles & Responsibilities:

  • Define the long-term vision and strategy for MLOps initiatives: Set the direction for the organization’s MLOps, model deployment, and monitoring practices.
  • Lead and manage a team of MLOps engineers: Provide technical guidance, mentorship, and career development for team members.
  • Identify and explore cutting-edge research areas and technologies: Stay abreast of the latest advancements in MLOps, model serving, and AI operations.
  • Drive innovation and the development of novel MLOps solutions: Lead efforts, prototype new approaches, and oversee implementation of advanced MLOps platforms.
  • Design and manage scalable ML infrastructure and pipelines on GCP; oversee model deployment (A/B testing, rollouts/rollbacks, auto-scaling), and establish monitoring/observability (performance, drift, KPIs).
  • Ensure ML operations meet governance, security, compliance, and disaster recovery standards across the organization.
  • Collaborate with executive leadership on strategic decision-making: Align MLOps initiatives with business objectives and organizational priorities.
  • Establish and enforce MLOps standards and best practices: Ensure quality, reproducibility, and security of ML systems across the organization.
  • Represent the organization in external MLOps communities: Speak at conferences, publish thought leadership, and build partnerships with academia and industry.


Technical Skills:

  • 12+ - years of experience
  • Mastery of relevant technical skills: Deep expertise in MLOps, model deployment, monitoring, and governance.
  • Significant experience in designing and implementing complex MLOps systems at scale: Lead the architecture and deployment of large-scale MLOps platforms on GCP.
  • Hands-on experience architecting large-scale ML platforms on GCP (Vertex AI, GKE, Dataflow, Big Query, Pub/Sub, Cloud Composer), implementing experiment tracking (MLflow, Weights & Biases, TensorBoard), feature stores (Vertex AI), data pipelines and workflow orchestration, and ensuring cloud security, compliance, disaster recovery, and cost optimization.
  • Strong leadership and team management skills: Build, mentor, and lead high-performing MLOps teams.
  • Excellent strategic thinking and problem-solving abilities: Translate business challenges into scalable, reliable MLOps solutions.
  • Exceptional communication and influencing skills: Advocate for MLOps initiatives, and influence executive decisions and represent the organization externally through conferences, publications, and industry engagement.


Must Have Skills:

  • Deep expertise in MLOps, model deployment, monitoring, and governance
  • Experience building scalable MLOps platforms on GCP
  • Proficiency with CI/CD for ML, containerization (e.g. Docker, Kubernetes), IaC (Terraform), and orchestration
  • Leadership in MLOps strategy, standards, and cross-team collaboration
  • Hands-on expertise with GCP ML and data services (Vertex AI, Dataflow, BigQuery, Pub/Sub, Cloud Composer, GKE).
  • Experience implementing model observability (performance monitoring, drift detection, dashboards, and alerts).
  • Proficiency with experiment tracking (MLflow, W&B) and feature store management.
  • Knowledge of cloud security, compliance, and cost optimization strategies.

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You