Get alerts for new jobs matching your selected skills, preferred locations, and experience range. Manage Job Alerts
4.0 - 8.0 years
5 - 9 Lacs
Bengaluru, Karnataka, India
On-site
We are looking for a highly motivated and skilled AI Software architectto join our team. You will work with a team of Software Engineers to optimize DL models for inference and training, libraries, and applications for Instinct GPUs in both on-prem and Cloud environments. Candidates should be strong in Python and/or C++ and GPU programming. Candidates should also have experience analyzing and optimizing the performance of AI software and understand hardware bottlenecks and harness performance to hit close to roofline. Must be self-motivated and possess the ability to work well within a team environment. KEY QUALIFICATIONS: Strong programming skills in C++ and Python Strong development experience is at least one major DL framework such as vLLM, Pytorch or Tensorflow in inference and/or fine tuning and/or training on multi-node clusters Seeking solid experience in developing kernels, quantizing models and hyper parameter optimizations Experience developing software and system-level performance optimizations with a solid architecture understanding and roofline performance in GPUs MS with years of related experience or PhD with years of related experience in Computer Science or Computer Engineering or related equivalent. Experience with open-source software development including collaboration with community maintainers and submitting contributions is a plus Development experience in CK, Triton and other GPU programming a plus Publications in reputed peer-reviewed ML conferences/journals a plus Excellent analytical and problem-solving skills root-causing/addressing performance issues. Ability to work independently and as part of a team. Willingness to learn skills, tools, and methods to advance the quality, consistency, and timeliness of AMD AI products. PREFERRED EXPERIENCE: Expertise in profiling tools across the AI SW Stack (Torchprofiler, RocM profiler, Vtune, Nsight) Experience in implementing and optimizing parallel methods on GPU accelerators (NCCL/RCCL, OpenMP, MPI) Performance analysis skills for GPUs Experience providing clear and timely communication related to status and other key aspects of the project to leadership team.
Posted 3 days ago
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Accenture
39581 Jobs | Dublin
Wipro
19070 Jobs | Bengaluru
Accenture in India
14409 Jobs | Dublin 2
EY
14248 Jobs | London
Uplers
10536 Jobs | Ahmedabad
Amazon
10262 Jobs | Seattle,WA
IBM
9120 Jobs | Armonk
Oracle
8925 Jobs | Redwood City
Capgemini
7500 Jobs | Paris,France
Virtusa
7132 Jobs | Southborough