🚀 Role Overview We’re hiring an AI Engineer Intern to join a fast-moving engineering team building production-ready ML systems. This role is hands-on: you’ll design, implement, and deploy model inference pipelines, optimize GPU utilization and memory, and help turn research prototypes into stable APIs that scale. Ideal for motivated college students (1st–4th year) who want deep practical experience with modern ML stacks. 🧠 What You’ll Do Prototype and evaluate ML models (Diffusion, VAE, Transformers, RLHF and related architectures). Build and test end-to-end model pipelines from preprocessing to inference and postprocessing. Develop an Inference API around models (REST/gRPC) and deploy to cloud/servers. Implement input-batching and dynamic batching strategies to maximize throughput and reduce latency. Optimize memory usage and GPU performance , including careful tensor placement, mixed precision, and memory profiling. Integrate model-serving frameworks and production tooling; optionally work with NVIDIA Triton for high-performance serving. Write reliable tests and monitoring for inference correctness, performance, and resource consumption. Collaborate with engineers and researchers to iterate on model improvements and deployment strategies. 🔧 Required Skills Proficient in Python with production-grade coding practices (testing, linting, packaging). Strong PyTorch expertise , including an understanding of its execution model, autograd, tensors, CUDA contexts, and memory behavior. Experience building inference pipelines and deploying model-backed APIs. Practical GPU knowledge : memory budgeting, mixed-precision (AMP), CUDA streams, and utilization trade-offs. Comfort with Linux command line, Docker, and basic cloud concepts . ✨ Nice-to-Have (Optional) NVIDIA Triton experience for model serving. Familiarity with HuggingFace Transformers, Diffusers, Accelerate, xformers . Experience with distributed inference, model sharding, or tensor parallelism . Background in profiling tools (Nsight, nvprof, PyTorch profiler) and memory tracing. Knowledge of RLHF , fine-tuning workflows, or advanced generative modeling tricks. 🎯 Who Should Apply College students in 1st, 2nd, 3rd, or 4th year eager to build production machine learning systems. People who enjoy bridging research and engineering, writing clean code, and iterating quickly. Problem-solvers who can balance correctness, speed, and resource constraints in real systems. 📍 Location and Time Remote role with flexible hours. Must be able to collaborate during reasonable overlap with the team. 💸 Compensation INR 10,000 – 15,000 per month . ✅ What We Offer Real responsibility and a chance to ship end-to-end features. Mentorship from senior ML engineers and researchers. Exposure to production-grade model serving, GPU optimization, and cloud deployment. A portfolio-building internship with measurable impact. 📩 How to Apply Fill the following Google form https://forms.gle/k3kT98CNDu8urMSK8 before 12th Nov 2025.