Job
Description
You are an experienced DevOps / MLOps Engineer for Robotics responsible for building and managing infrastructure for data management, training, testing, deployment, telemetry, and fleet operations. Your role involves ensuring reliable, scalable, and secure pipelines to support advanced robotics applications across humanoids, AGVs, cars, and drones. Your key responsibilities include developing and maintaining data lakes with labeling and auto-labeling pipelines, designing feature stores, orchestrating training workflows using Kubernetes (K8s) and Slurm, managing experiment tracking and model registries, deploying models on edge devices with A/B testing and staged rollouts, and building systems for telemetry, health monitoring, remote assist, OTA updates, and security. To excel in this role, you must possess a Masters or PhD in a relevant field with at least 5-10 years of experience in infrastructure engineering for robotics or ML systems. You should have proven expertise in cloud and edge CI/CD pipelines, containerization, GPU scheduling, observability tools like Prometheus and Grafana, artifact/version control, and reproducibility practices. Additionally, it would be advantageous if you have experience integrating real-time operating systems (RTOS) and familiarity with safety-certified build pipelines. Your success in this position will be measured based on various metrics including training throughput and cost efficiency, deployment latency and responsiveness, Mean Time to Recovery (MTTR), and rollout success rate across robotic fleets. In this domain, you will work on a diverse set of robotics applications including humanoids, AGVs, cars, and drones. For humanoids, your focus will be on low-latency teleoperation and lab safety gate integration. For AGVs, you will handle facility-wide OTA updates under network constraints. When working with cars, you will be responsible for compliance logging and traceability in regulated environments. For drones, your tasks will involve managing Beyond Visual Line of Sight (BVLOS) fleet telemetry and geofence updates.,