Posted:1 month ago|
Platform:
On-site
Full Time
Job Information Department Name Platforms & Compilers Job Type Full time Date Opened 14/05/2025 Industry Software Development Minimum Experience In Years 3 Maximum Experience In Years 5 City Saidapet Province Tamil Nadu Country India Postal Code 600089 About Us MulticoreWare is a global software solutions & products company with its HQ in San Jose, CA, USA. With worldwide offices, it serves its clients and partners in North America, EMEA and APAC regions. Started by a group of researchers, MulticoreWare has grown to serve its clients and partners on HPC & Cloud computing, GPUs, Multicore & Multithread CPUS, DSPs, FPGAs and a variety of AI hardware accelerators. MulticoreWare was founded by a team of researchers that wanted a better way to program for heterogeneous architectures. With the advent of GPUs and the increasing prevalence of multi-core, multi-architecture platforms, our clients were struggling with the difficulties of using these platforms efficiently. We started as a boot-strapped services company and have since expanded our portfolio to span products and services related to compilers, machine learning, video codecs, image processing and augmented/virtual reality. Our hardware expertise has also expanded with our team; we now employ experts on HPC and Cloud Computing, GPUs, DSPs, FPGAs, and mobile and embedded platforms. We specialize in accelerating software and algorithms, so if your code targets a multi-core, heterogeneous platform, we can help. Job Description Job Summary We are seeking an experienced GPU Programming Engineer to join our team. In this role, you will focus on developing, optimizing, and deploying GPU-accelerated solutions for high-performance machine learning workloads. The ideal candidate has strong expertise in GPU programming across one or more platforms (e.g., NVIDIA CUDA, AMD ROCm/HIP, or OpenCL) and is comfortable working at the intersection of parallel computing, performance tuning, and ML system integration. Key Responsibilities Develop, optimize, and maintain GPU-accelerated components for machine learning pipelines using frameworks such as CUDA, HIP, or OpenCL Analyze and improve GPU kernel performance through profiling, benchmarking, and resource optimization. Optimize memory access, compute throughput, and kernel execution to improve overall system performance on the target GPUs. Port existing CPU-based implementations to GPU platforms while ensuring correctness and performance scalability. Work closely with system architects, software engineers, and domain experts to integrate GPU-accelerated solutions. Required Qualifications Bachelor's or master's degree in computer science, Electrical Engineering, or a related field. 3+ years of hands-on experience in GPU programming using CUDA, HIP, OpenCL, or other GPU compute APIs. Strong understanding of GPU architecture, memory hierarchy, and parallel programming models. Proficiency in C/C++ and hands-on experience developing on Linux-based systems. Familiarity with profiling and tuning tools such as Nsight, rocprof, or Perfetto. Preferred Qualifications Familiarity with cuDNN, TensorRT, OpenCL, or other GPU computing libraries.
MulticoreWare
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Saidapet, Chennai, Tamil Nadu
Salary: Not disclosed
Chennai, Coimbatore
15.0 - 30.0 Lacs P.A.