Senior AI Engineer

5 - 9 years

0 Lacs

Posted:19 hours ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Company: Indian / Global Engineering & Manufacturing Organization Key Skills: Machine Learning, ML, AI Artificial intelligence, Artificial Intelligence, Tensorflow, Python, Pytorch. Roles and Responsibilities: Design, build, and rigorously optimize the complete stack necessary for large-scale model training, fine-tuning, and inference--including dataloading, distributed training, and model deployment--to maximize Model Flop Utilization (MFU) on compute clusters. Collaborate closely with research scientists to translate state-of-the-art models and algorithms into production-grade, high-performance code and scalable infrastructure. Implement, integrate, and test advancements from recent research publications and open-source contributions into enterprise-grade systems. Profile training workflows to identify and resolve bottlenecks across all layers of the training stack--from input pipelines to inference--enhancing speed and resource efficiency. Contribute to evaluations and selections of hardware, software, and cloud platforms defining the future of the AI infrastructure stack. Use MLOps tools (e.g., MLflow, Weights & Biases) to establish best practices across the entire AI model lifecycle, including development, validation, deployment, and monitoring. Maintain extensive documentation of infrastructure architecture, pipelines, and training processes to ensure reproducibility and smooth knowledge transfer. Continuously research and implement improvements in large-scale training strategies and data engineering workflows to keep the organization at the cutting edge. Demonstrate initiative and ownership in developing rapid prototypes and production-scale systems for AI applications in the energy sector. Experience Requirement: 5-9 years of experience building and optimizing large-scale machine learning infrastructure, including distributed training and data pipelines. Proven hands-on expertise with deep learning frameworks such as PyTorch, JAX, or PyTorch Lightning in multi-node GPU environments. Experience in scaling models trained on large datasets across distributed computing systems. Familiarity with writing and optimizing CUDA, Triton, or CUTLASS kernels for performance enhancement is preferred. Hands-on experience with AI/ML lifecycle management using MLOps frameworks and performance profiling tools. Demonstrated collaboration with AI researchers and data scientists to integrate models into production environments. Track record of open-source contributions in AI infrastructure or data engineering is a significant plus. Education: M.E., B.Tech M.Tech (Dual), BCA, B.E., B.Tech, M. Tech, MCA. Show more Show less

Mock Interview

Practice Video Interview with JobPe AI

Start Ai Interview Now

My Connections MyCareernet

Download Chrome Extension (See your connection in the MyCareernet )

chrome image
Download Now

RecommendedJobs for You

Gurugram, Haryana, India

Gurgaon, Haryana, India

Thane, Maharashtra, India

Hyderabad, Telangana, India

Chalakkudy, Kerala, India