Jobs

Interviews
Job Alerts
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Home
>
Jobs in karnataka
>
eBay
>
ML Platform Engineer

ML Platform Engineer

eBay

5 years

0 Lacs

karnataka

Posted:4 days ago| Platform: GlassDoor logo

Apply

Skills Required

ml connect ai power learning model experimentation deployment reliability support kubernetes monitoring training inference collaboration scalability usability triage resolve scaling compliance service scheduling networking orchestration opentelemetry automation analysis engineering devops autoscaling data configuration cuda python diagnostics tooling aws gcp azure management tensorflow pytorch integration debugging communication research accessibility

Work Mode

On-site

Job Type

Part Time

Job Description

At eBay, we're more than a global ecommerce leader — we’re changing the way the world shops and sells. Our platform empowers millions of buyers and sellers in more than 190 markets around the world. We’re committed to pushing boundaries and leaving our mark as we reinvent the future of ecommerce for enthusiasts.

Our customers are our compass, authenticity thrives, bold ideas are welcome, and everyone can bring their unique selves to work — every day. We're in this together, sustaining the future of our customers, our company, and our planet.

Join a team of passionate thinkers, innovators, and dreamers — and help us connect people and build communities to create economic opportunity for all.

At eBay, we are building the next-generation AI platform to power intelligent experiences for millions of users worldwide. Our AI Platform (AIP) provides the scalable, secure, and efficient foundation for deploying and optimizing advanced machine learning and large language model (LLM) workloads at production scale. We enable teams across eBay to move from experimentation to global deployment with speed, reliability, and efficiency.

We are seeking an experienced Machine Learning Platform Support Engineer to join our AI Platform team. In this role, you will be the first line of support (L1) for ML workloads running on Kubernetes and Ray.io clusters. You will be responsible for triaging, monitoring, and resolving platform-related issues across ML training, inference, model deployment, and GPU resource allocation.

This position includes participation in on-call rotations (PagerDuty) and requires close collaboration with ML Platform engineers, researchers, and platform teams to ensure the reliability, scalability, and usability of the AI Platform. You will play a critical role in ensuring operational excellence and maintaining the uptime of the core infrastructure that powers eBay’s global AI and ML systems.

What you will accomplish

Serve as the first point of contact (L1) for all support requests related to the AI/ML Platform, including ML training, inference, model deployment, and GPU allocation.
Provide operational and on-call (PagerDuty) support for Ray.io and Kubernetes clusters running distributed ML workloads across cloud and on-prem environments.
Monitor, triage, and resolve platform incidents involving job failures, scaling errors, cluster instability, or GPU resource contention.
Manage GPU quota allocation and scheduling across multiple user teams, ensuring compliance with approved quotas and optimal resource utilization.
Support Ray Train/Tune for large-scale distributed training and Ray Serve for autoscaled inference, maintaining performance and service reliability.
Troubleshoot Kubernetes workloads, including pod scheduling, networking, image issues, and resource exhaustion in multi-tenant namespaces.
Collaborate with platform engineers, SREs, and ML practitioners to resolve infrastructure, orchestration, and dependency issues impacting ML workloads.
Improve observability, monitoring, and alerting for Ray and Kubernetes clusters using Prometheus, Grafana, and OpenTelemetry to enable proactive issue detection.
Maintain and enhance runbooks, automation scripts, and knowledge base documentation to accelerate incident resolution and reduce recurring support requests.
Participate in root cause analysis (RCA) and post-incident reviews, contributing to platform improvements and automation initiatives to minimize downtime.

What you will bring

Bachelor’s or Master’s degree in Computer Science, Engineering, or related technical discipline (or equivalent experience).
5+ years of experience in ML operations, DevOps, or platform support for distributed AI/ML systems.
Proven experience providing L1/L2 and on-call support for Ray.io and Kubernetes-based clusters supporting ML training and inference workloads.
Strong understanding of Ray cluster operations, including autoscaling, job scheduling, and workload orchestration across heterogeneous compute (CPU/GPU/accelerators).
Hands-on experience managing Kubernetes control plane and data plane components, multi-tenant namespaces, RBAC, ingress, and resource isolation.
Expertise in GPU scheduling, allocation, and monitoring (NVIDIA device plugin, MIG configuration, CUDA/NCCL optimization).
Proficiency in Python and/or Go for automation, diagnostics, and operational tooling in distributed environments.
Working knowledge of Kubernetes and cloud-native environments (AWS, GCP, Azure) and CI/CD pipelines.
Experience with observability stacks (Prometheus, Grafana, OpenTelemetry) and incident management tools (PagerDuty, ServiceNow).
Familiarity with ML frameworks such as TensorFlow and PyTorch, and their integration within distributed Ray/Kubernetes clusters.
Strong debugging, analytical, and communication skills to collaborate effectively with cross-functional engineering and research teams.
A customer-centric, operationally disciplined mindset focused on maintaining platform reliability, performance, and user satisfaction.

Please see the Talent Privacy Notice for information regarding how eBay handles your personal data collected when you use the eBay Careers website or apply for a job with eBay.

eBay is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, sex, sexual orientation, gender identity, veteran status, and disability, or other legally protected status. If you have a need that requires accommodation, please contact us at talent@ebay.com. We will make every effort to respond to your request for accommodation as soon as possible. View our accessibility statement to learn more about eBay's commitment to ensuring digital accessibility for people with disabilities.

The eBay Jobs website uses cookies to enhance your experience. By continuing to browse the site, you agree to our use of cookies. Visit our Privacy Center for more information.

More Jobs at eBay

SEM I, Retail Operations

Bengaluru, Karnataka, India

Experience: Not specified

Salary: Not disclosed

Ethics Monitoring & Testing Lead

Mumbai Metropolitan Region

4.0 - 4.0 yrs

Salary: Not disclosed

Sr. Product manager - Taxonomy & Catalog

Bengaluru, Karnataka, India

6.0 - 6.0 yrs

Salary: Not disclosed

Data and Business Intelligence Analyst

Bengaluru, Karnataka, India

4.0 - 7.0 yrs

Salary: Not disclosed

Software Engineer (T25)

Bengaluru, Karnataka, India

4.0 - 4.0 yrs

Salary: Not disclosed

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

eBay

Login to

Please Verify Your Phone or Email

Confirm Action

ML Platform Engineer