Get alerts for new jobs matching your selected skills, preferred locations, and experience range. Manage Job Alerts
4.0 - 8.0 years
5 - 9 Lacs
Bengaluru, Karnataka, India
On-site
We are looking for a highly motivated and skilled AI Software architectto join our team. You will work with a team of Software Engineers to optimize DL models for inference and training, libraries, and applications for Instinct GPUs in both on-prem and Cloud environments. Candidates should be strong in Python and/or C++ and GPU programming. Candidates should also have experience analyzing and optimizing the performance of AI software and understand hardware bottlenecks and harness performance to hit close to roofline. Must be self-motivated and possess the ability to work well within a team environment. KEY QUALIFICATIONS: Strong programming skills in C++ and Python Strong development experience is at least one major DL framework such as vLLM, Pytorch or Tensorflow in inference and/or fine tuning and/or training on multi-node clusters Seeking solid experience in developing kernels, quantizing models and hyper parameter optimizations Experience developing software and system-level performance optimizations with a solid architecture understanding and roofline performance in GPUs MS with years of related experience or PhD with years of related experience in Computer Science or Computer Engineering or related equivalent. Experience with open-source software development including collaboration with community maintainers and submitting contributions is a plus Development experience in CK, Triton and other GPU programming a plus Publications in reputed peer-reviewed ML conferences/journals a plus Excellent analytical and problem-solving skills root-causing/addressing performance issues. Ability to work independently and as part of a team. Willingness to learn skills, tools, and methods to advance the quality, consistency, and timeliness of AMD AI products. PREFERRED EXPERIENCE: Expertise in profiling tools across the AI SW Stack (Torchprofiler, RocM profiler, Vtune, Nsight) Experience in implementing and optimizing parallel methods on GPU accelerators (NCCL/RCCL, OpenMP, MPI) Performance analysis skills for GPUs Experience providing clear and timely communication related to status and other key aspects of the project to leadership team.
Posted 1 month ago
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Accenture
73564 Jobs | Dublin
Wipro
27625 Jobs | Bengaluru
Accenture in India
22690 Jobs | Dublin 2
EY
20638 Jobs | London
Uplers
15021 Jobs | Ahmedabad
Bajaj Finserv
14304 Jobs |
IBM
14148 Jobs | Armonk
Accenture services Pvt Ltd
13138 Jobs |
Capgemini
12942 Jobs | Paris,France
Amazon.com
12683 Jobs |