ML GPU Kernel Development Engineer

2 - 7 years

7 - 12 Lacs

Posted:2 days ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

  • Design and implement highly optimized ML kernels (e.g., matrix operations, attention mechanisms) for AMD GPUs using ROCm.
  • Profile, debug, and tune kernel performance to maximize hardware utilization for AI workloads.
  • Collaborate with ML researchers and framework developers to integrate kernels into AI frameworks (e.g., PyTorch, TensorFlow) and inference engines (e.g., vLLM, SGLang).
  • Contribute to the ROCm software stack by identifying and resolving bottlenecks in libraries like MIOpen, BLAS, or Composable Kernel.
  • Stay updated on the latest AI/ML trends (LLMs, quantization, distributed inference) and apply them to kernel development.
  • Document and communicate technical designs, benchmarks, and best practices.
  • Troubleshoot and resolve issues related to GPU compatibility, performance, and scalability.
REQUIRED EXPERIENCE:
  • 2+ years of experience in GPU kernel development for machine learning (ROCm or CUDA).
  • Proficiency in C/C++ and Python, with experience in performance-critical programming.
  • Strong understanding of ML frameworks (PyTorch, TensorFlow) and GPU-accelerated libraries.
  • Basic knowledge of modern AI technologies (LLMs, transformers, inference optimization).
  • Familiarity with parallel computing, memory optimization, and hardware architectures.
  • Problem-solving skills and ability to work in a fast-paced environment.
PREFERRED EXPERIENCE:
  • Direct experience with AMD ROCm development (HIP, MIOpen, Composable Kernel).
  • Knowledge of LLM-specific optimizations (e.g., FlashAttention, PagedAttention in vLLM).
  • Experience with distributed training/inference or model compression techniques.
  • Contributions to open-source ML projects or GPU compute libraries.

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
Advanced Micro Devices, Inc logo
Advanced Micro Devices, Inc

Semiconductors

Sunnyvale

RecommendedJobs for You