Job
Description
Long Description
Location: Bangalore
Experience: 7+ years
Choosing Capgemini joining a team where youll be empowered to build cutting-edge AI infrastructure, supported by a collaborative global community, and inspired to reimagine whats possible. Join us in enabling scalable, fault-tolerant AI systems that power next-generation machine learning workloads.Your Role
As an AI Runtime Engineer , you will design and optimize distributed AI runtimes that enable high-performance, multi-node, multi-GPU training at scale. Youll work closely with AI infrastructure teams to build elastic, fault-tolerant systems and ensure seamless orchestration for advanced AI workloads.
In this role, you will:
Architect and implement distributed AI runtime systems with elastic scaling and job recovery.Optimize performance at low levels (CUDA, NCCL, PyTorch internals) for multi-GPU workloads.Develop custom runtime architectures for large-scale AI training pipelines.Integrate orchestration tools like Kubernetes, Ray, TorchElastic, Horovod for containerized AI workloads.Implement fault recovery mechanisms and observability hooks for runtime health monitoring.Collaborate with AI researchers and platform engineers to ensure efficient resource utilization and throughput optimization.Contribute to CI/CD pipelines for AI infrastructure and runtime deployments.
Your ProfileMandatory Skills:Hands-on experience in distributed training systems , multi-node/multi-GPU orchestration.Expertise in PyTorch internals , CUDA, NCCL, and performance profiling.Strong knowledge of Kubernetes , containerization, and orchestration frameworks.Preferred Skills:Experience with TorchElastic , Ray, Horovod.Open-source contributions to PyTorch or runtime libraries.Background in HPC, compilers, or systems research .Education:Bachelors/Masters in Computer Science, Engineering, or related field.About Us
At Capgemini Engineering, the world leader in engineering services, we bring together a global team of engineers, scientists, and architects to help the worlds most innovative companies unleash their potential. From autonomous cars to life-saving robots, our digital and software technology experts think outside the box as they provide unique R&D and engineering services across all industries. Join us for a career full of opportunities. Where you can make a difference. Where no two days are the same.