Jobs
Interviews

1 Dl Framework Jobs

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

4.0 - 8.0 years

5 - 9 Lacs

Bengaluru, Karnataka, India

On-site

We are looking for a highly motivated and skilled AI Software architectto join our team. You will work with a team of Software Engineers to optimize DL models for inference and training, libraries, and applications for Instinct GPUs in both on-prem and Cloud environments. Candidates should be strong in Python and/or C++ and GPU programming. Candidates should also have experience analyzing and optimizing the performance of AI software and understand hardware bottlenecks and harness performance to hit close to roofline. Must be self-motivated and possess the ability to work well within a team environment. KEY QUALIFICATIONS: Strong programming skills in C++ and Python Strong development experience is at least one major DL framework such as vLLM, Pytorch or Tensorflow in inference and/or fine tuning and/or training on multi-node clusters Seeking solid experience in developing kernels, quantizing models and hyper parameter optimizations Experience developing software and system-level performance optimizations with a solid architecture understanding and roofline performance in GPUs MS with years of related experience or PhD with years of related experience in Computer Science or Computer Engineering or related equivalent. Experience with open-source software development including collaboration with community maintainers and submitting contributions is a plus Development experience in CK, Triton and other GPU programming a plus Publications in reputed peer-reviewed ML conferences/journals a plus Excellent analytical and problem-solving skills root-causing/addressing performance issues. Ability to work independently and as part of a team. Willingness to learn skills, tools, and methods to advance the quality, consistency, and timeliness of AMD AI products. PREFERRED EXPERIENCE: Expertise in profiling tools across the AI SW Stack (Torchprofiler, RocM profiler, Vtune, Nsight) Experience in implementing and optimizing parallel methods on GPU accelerators (NCCL/RCCL, OpenMP, MPI) Performance analysis skills for GPUs Experience providing clear and timely communication related to status and other key aspects of the project to leadership team.

Posted 3 days ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies