Posted:1 day ago|
Platform:
On-site
Full Time
About The Opportunity We’re a deep-tech innovator at the intersection of Artificial Intelligence, machine-learning infrastructure, and edge-to-cloud platforms . Our award-winning solutions let Fortune-500 enterprises build, train, and deploy large-scale AI models—seamlessly, securely, and at lightning speed. As global demand for generative AI, RAG pipelines, and autonomous agents accelerates, we’re scaling our MLOps team to keep our customers two steps ahead of the curve. Role & Responsibilities (max 6) Own the full MLOps stack—design, build, and harden GPU-accelerated Kubernetes clusters across on-prem DCs and AWS/GCP/Azure for model training, fine-tuning, and low-latency inference. Automate everything: craft IaC modules (Terraform/Pulumi) and CI/CD pipelines that deliver zero-downtime releases and reproducible experiment tracking. Ship production-grade LLM workloads—optimize RAG/agent pipelines, manage model registries, and implement self-healing workflow orchestration with Kubeflow/Flyte/Prefect. Eliminate bottlenecks: profile CUDA, resolve driver mismatches, and tune distributed frameworks (Ray, DeepSpeed) for multi-node scale-out. Champion reliability: architect HA data lakes, databases, ingress/egress, DNS, and end-to-end observability (Prometheus/Grafana) targeting 99.99 % uptime. Mentor & influence: instill platform-first mind-set, codify best practices, and report progress/road-blocks directly to senior leadership. Skills & Qualifications (max 6) Must-Have 5 + yrs DevOps/Platform experience with Docker & Kubernetes; expert bash/Python/Go scripting. Hands-on building ML infrastructure for distributed GPU training and scalable model serving. Deep fluency in cloud services (EKS/GKE/AKS), networking, load-balancing, RBAC, and Git-based CI/CD. Proven mastery of IaC & config-management (Terraform, Pulumi, Ansible). Preferred Production experience with LLM fine-tuning, RAG architectures, or agentic workflows at scale. Exposure to Kubeflow, Flyte, Prefect, or Ray; track record of setting up observability and data-lake pipelines (Delta Lake, Iceberg). Skills: cloud services,containerization,automation tools,version control,devops
Albatronix
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
5.0 - 6.0 Lacs P.A.
Chandigarh
5.0 - 10.0 Lacs P.A.
Pune
25.0 - 30.0 Lacs P.A.
Mumbai
6.0 - 10.0 Lacs P.A.
Hyderābād
Salary: Not disclosed
Cochin
6.0 - 7.5 Lacs P.A.
Experience: Not specified
3.0 - 5.4 Lacs P.A.
Delhi
6.0 - 8.0 Lacs P.A.
3.0 - 4.2 Lacs P.A.
Gāndhīnagar
Experience: Not specified
Salary: Not disclosed