Home
Jobs

229 Cuda Jobs - Page 10

Filter
Filter Interviews
Min: 0 years
Max: 25 years
Min: ₹0
Max: ₹10000000
Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

10 - 15 years

12 - 17 Lacs

Bengaluru

Work from Office

Naukri logo

About the Job: The Data Development Insights Strategy (DDIS) team at Red Hat is seeking an AI Engineering Manager to lead a talented team of AI Engineers focused on the design, deployment, and optimization of AI model lifecycle frameworks within our OpenShift AI and RHEL AI infrastructures. As an AI Engineering Manager, you will be responsible for driving the technical vision and execution of AI model lifecycle management at scale, overseeing the development and deployment of cutting-edge AI technologies while ensuring the scalability, performance, and security of mission-critical AI models. In this leadership role, you will work closely with cross-functional teams, including Products Global Engineering (PGE) and IT AI Infra teams, to drive the deployment, maintenance and optimization of AI models and infrastructure, ensuring alignment with business objectives and strategic goals. You will be tasked with managing and mentoring a high-performing team of AI Engineers, driving innovation, setting technical priorities, and fostering a collaborative and growth-oriented team culture. This is an ideal role for someone with a strong background in AI/ML, MLOps, and leadership, looking to have a significant impact on Red Hats AI strategy and innovations. What you will do Lead and manage a team of AI Engineers, providing mentorship, guidance, and fostering a culture of continuous learning, collaboration, and technical excellence. Define and execute the technical strategy for AI model lifecycle management, ensuring the scalability, security, and optimization of AI models within Red Hats OpenShift and RHEL AI infrastructures. Oversee the development, deployment, and maintenance of AI models, working with engineering teams to ensure seamless integration, minimal downtime, and high availability in production environments. Drive the implementation of automation, CI/CD pipelines, and Infrastructure as Code (IaC) practices to streamline AI model deployment, updates, and monitoring. Collaborate with cross-functional teams (PGE, IT AI Infra, etc.) to ensure that AI models and infrastructure meet evolving business needs, data changes, and emerging technology trends. Manage and prioritize the resolution of feature requests (RFEs), ensuring timely, transparent communication and effective problem resolution. Guide the optimization of large-scale models, including foundational models like Mistral and LLama, and ensure optimal computational resource management (e.g., GPU optimization, cost management strategies). Lead efforts to monitor and enhance AI model performance, using advanced tools (OpenLLMetry, Splunk, Catchpoint) to identify and resolve performance bottlenecks. Define and track key performance metrics for AI models, ensuring that model updates and releases meet business expectations and deadlines (e.g., quarterly releases, RFEs resolved within 30 days). Foster collaboration between teams to ensure that model updates and optimizations align with both business objectives and technological advancements. Promote innovation by staying up-to-date with emerging AI technologies, tools, and industry trends, and integrating these advancements into Red Hats AI infrastructure. Take ownership of the teams growth and professional development, ensuring engineers are continuously challenged and supported in their career progression. What you will bring A bachelors or masters degree in Computer Science, Data Science, Machine Learning, or a related technical field, although hands-on experience and demonstrated leadership in AI engineering and MLOps can be considered in lieu of formal academic credentials. 10+ years of experience in AI engineering, MLOps, or related fields, and at least 3 years of leadership experience, you will have a strong background in managing high-performing engineering teams and mentoring Principal and Senior Engineers. Foster a culture of technical excellence, continuous improvement, and innovation within the team. Expertise in deploying, maintaining, and optimizing AI models at scale across cloud environments such as AWS, GCP, or Azure, and containerized platforms like OpenShift or Kubernetes. Experience with AI/ML frameworks, performance monitoring, and resource optimization (e.g., CUDA, MIG, vLLM, TGI) will ensure that AI models are efficient, scalable, and high-performing. Hands-on experience with Infrastructure as Code (IaC) practices, CI/CD tools (Git, Jenkins, Terraform), and automating AI model deployment and monitoring pipelines. Strong problem-solving skills for optimizing and troubleshooting large-scale AI systems and distributed architectures. Excellent communication skills, with the ability to interact effectively with both technical and non-technical stakeholders. Desired skills: 10+ years of experience in AI, MLOps, or related fields, including 3+ years of leadership experience. Experience in managing large-scale AI infrastructure, particularly in high-performance computing environments. Deep expertise in AI model lifecycle management, from development to deployment, monitoring, and performance optimization. A strong background in cross-functional collaboration, driving alignment between business objectives, engineering teams, and technical requirements. Proven ability to innovate, set technical direction, and deliver AI infrastructure improvements at scale. As an AI Engineering Manager at Red Hat, you will have the opportunity to shape the future of AI model lifecycle management at scale, influence strategic initiatives, and drive innovation across a high-performing engineering team. If youre a dynamic leader with a passion for AI and machine learning, and want to make a significant impact on Red Hats AI infrastructure, we encourage you to apply. About Red Hat is the worlds leading provider of enterprise software solutions, using a community-powered approach to deliver high-performing Linux, cloud, container, and Kubernetes technologies. Spread across 40+ countries, our associates work flexibly across work environments, from in-office, to office-flex, to fully remote, depending on the requirements of their role. Red Hatters are encouraged to bring their best ideas, no matter their title or tenure. We're a leader in open source because of our open and inclusive environment. We hire creative, passionate people ready to contribute their ideas, help solve complex problems, and make an impact. Diversity, Equity Inclusion at Red Hat Red Hats culture is built on the open source principles of transparency, collaboration, and inclusion, where the best ideas can come from anywhere and anyone. When this is realized, it empowers people from diverse backgrounds, perspectives, and experiences to come together to share ideas, challenge the status quo, and drive innovation. Our aspiration is that everyone experiences this culture with equal opportunity and access, and that all voices are not only heard but also celebrated. We hope you will join our celebration, and we welcome and encourage applicants from all the beautiful dimensions of diversity that compose our global village. Equal Opportunity Policy (EEO) Red Hat is proud to be an equal opportunity workplace and an affirmative action employer. We review applications for employment without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, citizenship, age, veteran status, genetic information, physical or mental disability, medical condition, marital status, or any other basis prohibited by law. Red Hat does not seek or accept unsolicited resumes or CVs from recruitment agencies. We are not responsible for, and will not pay, any fees, commissions, or any other payment related to unsolicited resumes or CVs except as required in a written contract between Red Hat and the recruitment agency or party requesting payment of a fee. Red Hat supports individuals with disabilities and provides reasonable accommodations to job applicants. If you need assistance completing our online job application, email . General inquiries, such as those regarding the status of a job application, will not receive a reply.

Posted 3 months ago

Apply

2 - 5 years

0 Lacs

Chennai, Tamil Nadu, India

Linkedin logo

Job Title: Lead Consultant - HPC Application Engineer Career Level: E Introduction to role Join our dynamic Research Data & Analytics Team within R&D IT, a global group of skilled data and AI engineers dedicated to revolutionizing the way we discover and develop medicine. Our mission is to partner with scientific teams to deliver innovative capabilities, products, and platforms that accelerate the development of safe and effective medicines for patients. Scientific Computing Platform The Scientific Computing Platform (SCP) is a key component of our efforts, providing high-performance computing (HPC) solutions that support computational chemistry, imaging, multi-OMICs, structural biology, data science, and AI. About The Platform The SCP team provides the high-performance computing (HPC) platform and optimised applications on which scientists build their workflows. We are driven to accelerate scientific discovery, and achieve this through rapid deployment of applications, optimisation of complex workflows and application tuning for very large problems. An overarching principle is to maximise the impact of the team's support efforts. We are seeking a passionate HPC engineer focussed on applications and research software engineering. The ideal candidate will have extensive hands-on experience making an impact with HPC technology, delivering HPC services to a high quality, and able to relate to the scientific community and work closely with users to make the best use of research computing services. The HPC landscape is continually evolving. You will need the skills to help build, optimise and operate industry-leading capabilities, including application build frameworks, containerised applications and cloud software-as-a-service. Automated deployment is a key feature of your work and you will need to be comfortable with DevOps processes and delivering consistency through automation and infrastructure-as-code. A strong focus of the role will also be working directly with scientific users to help them optimise and productionise their code and make best use of the facility. Accountabilities As an HPC Application Engineer, you will be responsible for developing, delivering, and operating research computing services and applications. You will take a Site Reliability Engineering approach to manage the end-to-end development, deployment, monitoring, and incident response of HPC services. Your role will involve solving complex technical problems related to SCP applications and services, as well as assisting users in debugging and optimizing their workflows and applications. You will work closely with scientific users to help them optimize and productionize their code, ensuring they make the best use of our research computing services. Essential Skills/Experience Scientific application installation, optimisation and configurationEffective use of HPC job schedulers such as SLURMExperienced working in a Linux environmentCompetent in multiple programming and scripting languages from the following list: python, R, Shell Scripts, C/C++, Golang, and deep expertise in at least one of themDeep understanding of the factors influencing HPC application performanceHighly customer focused; able to explain IT technical concepts in a manner which non-IT experts can understand Desirable Skills/Experience Scientific degree, and/or experience in computationally intensive analysis of scientific dataPrevious experience in high performance computing (HPC) environments, especially at large scales (>10,000 cores)Experience with high performance parallel filesystems at petabyte scale, e.g. GPFS, LustreHands-on knowledge of a range of scientific and HPC applications such as simulation software, bioinformatics tools or 3D data visualisation packagesExperience with software build frameworks such as Easybuild or SpackExpertise in GPU, AI/ML tools and frameworks (CUDA, TensorFlow, PyTorch)Strong understanding of parallel programming techniques (e.g. MPI, pthreads, OpenMP) and code profiling/optimisationExperience with workflow engines (e.g. Apache Airflow, Nextflow, Cromwell, AWS StepFunctions)Familiarity with container runtimes such as Docker, Singularity or enrootExpertise in specific scientific domains relevant to early drug development, such as deep learning, medical imaging, molecular dynamics or 'omics.Experience with frameworks for regression tests and benchmarks for HPC applications, like Reframe HPCExperience with working in GxP-validated environmentsExperience administering and optimising a HPC job scheduler (e.g. SLURM)Experience with configuration automation and infrastructure as code (e.g. Ansible, Hashicorp Terraform, AWS CloudFormation, Amazon Cloud Developer Kit)Experience deploying infrastructure and code to public cloud, especially AWSHands-on experience working in a DevOps team and using agile methodologies At AstraZeneca, we demonstrate technology to impact patients' lives directly by transforming our ability to develop life-changing medicines. We empower our teams to perform at their peak by combining cutting-edge science with leading digital technology platforms and data. Our dynamic environment encourages innovation and ownership, providing countless opportunities to learn and grow. Join us in our mission to reinvent the industry and make a meaningful impact on the world. Ready to make a difference? Apply now!

Posted 4 months ago

Apply

5 years

0 Lacs

Hyderabad, Telangana, India

Hybrid

Linkedin logo

NVIDIA has continuously reinvented itself. Our invention of the GPU sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. Today, research in artificial intelligence is booming worldwide, which calls for highly scalable and massively parallel computation horsepower that NVIDIA GPUs excel. NVIDIA is a “learning machine” that constantly evolves by adapting to new opportunities that are hard to solve, that only we can address, and that matter to the world. This is our life’s work , to amplify human creativity and intelligence. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Come join our diverse team and see how you can make a lasting impact on the world! As a member of the GPU AI/HPC Infrastructure team, you will provide leadership in the design and implementation of groundbreaking GPU compute clusters that powers all AI research across NVIDIA. We seek an expert to build and operate these clusters at high reliability, efficiency, and performance and drive foundational improvements and automation to improve researchers productivity. As a Site Reliability Engineer, you are responsible for the big picture of how our systems relate to each other, we use a breadth of tools and approaches to tackle a broad spectrum of problems. Practices such as limiting time spent on reactive operational work, blameless postmortems and proactive identification of potential outages factor into iterative improvement that is key to both product quality and interesting dynamic day-to-day work. SRE's culture of diversity, intellectual curiosity, problem solving and openness is important to our success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to build an environment that provides the support and mentorship needed to learn and grow. What You'll Be Doing In this role you will be building and improving our ecosystem around GPU-accelerated computing including developing large scale automation solutions. You will also be maintaining and building deep learning AI-HPC GPU clusters at scale and supporting our researchers to run their flows on our clusters including performance analysis and optimizations of deep learning workflows. You will design, implement and support operational and reliability aspects of large scale distributed systems with focus on performance at scale, real time monitoring, logging, and alerting. Design and implement state-of-the-art GPU compute clusters.Optimize cluster operations for maximum reliability, efficiency, and performance.Drive foundational improvements and automation to enhance researcher productivity.Troubleshoot, diagnose, and root cause of system failures and isolate the components/failure scenarios while working with internal & external partners.Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.Practice sustainable incident response and blameless postmortems and Be part of an on-call rotation to support production systemsWrite and review code, develop documentation and capacity plans, debug the hardest problems, live, on some of the largest and most complex systems in the world.Implement remediations across software and hardware stack according to plan, while keeping a thorough procedural record and data log and Manage upgrades and automated rollbacks across all clusters. What We Need To See Bachelor’s degree in computer science, Electrical Engineering or related field or equivalent experience with a minimum 5+ years of experience designing and operating large scale compute infrastructure.Proven experience in site reliability engineering for high-performance computing environments with operational experience of at least 2K GPUs cluster.Deep understanding of GPU computing and AI infrastructure.Passion for solving complex technical challenges and optimizing system performance.Experience with AI/HPC advanced job schedulers, and ideally familiarity with schedulers such as Slurm.Working knowledge of cluster configuration management tools such as BCM or Ansible and infrastructure level applications, such as Kubernetes, Terraform, MySQL, etc.In depth understating of container technologies like Docker, Enroot, etc.Experience programming in Python and Bash scripting. Ways To Stand Out From The Crowd Interest in crafting, analyzing, and fixing large-scale distributed systems.Familiarity with NVIDIA GPUs, Cuda Programming, NCCL, MLPerf benchmarking, InfiniBand with IBoIP and RDMA.Experience with Cloud Deployment, BCM, Terraform.Understanding of fast, distributed storage systems like Lustre and GPFS for AI/HPC workloads.Multi-cloud experience. JR1993564

Posted 5 months ago

Apply

0.0 - 2.0 years

0 Lacs

Chennai, Tamil Nadu

On-site

Indeed logo

Job Information Department Name Platforms & Compilers Job Type Full time Date Opened 23/10/2024 Industry Software Development Minimum Experience In Years 7 Maximum Experience In Years 10 City Chennai Province Tamil Nadu Country India Postal Code 600086 About Us MulticoreWare is a global software solutions & products company with its HQ in San Jose, CA, USA. With worldwide offices, it serves its clients and partners in North America, EMEA and APAC regions. Started by a group of researchers, MulticoreWare has grown to serve its clients and partners on HPC & Cloud computing, GPUs, Multicore & Multithread CPUS, DSPs, FPGAs and a variety of AI hardware accelerators. MulticoreWare was founded by a team of researchers that wanted a better way to program for heterogeneous architectures. With the advent of GPUs and the increasing prevalence of multi-core, multi-architecture platforms, our clients were struggling with the difficulties of using these platforms efficiently. We started as a boot-strapped services company and have since expanded our portfolio to span products and services related to compilers, machine learning, video codecs, image processing and augmented/virtual reality. Our hardware expertise has also expanded with our team; we now employ experts on HPC and Cloud Computing, GPUs, DSPs, FPGAs, and mobile and embedded platforms. We specialize in accelerating software and algorithms, so if your code targets a multi-core, heterogeneous platform, we can help. Job Description Key Responsibilities: Implement and optimize machine learning, computer vision, and numeric libraries for target hardware architectures, including CPUs, GPUs, DSPs, and other accelerators. Work closely with software and hardware engineers to ensure optimal performance on target platforms. Implement low-level optimizations, including algorithmic modifications, parallelization, vectorization, and memory access optimizations, to fully leverage the capabilities of the target hardware architectures. Work with customers to understand their requirements and implement libraries to meet their needs. Develop performance benchmarks and conduct performance analysis to ensure the optimized libraries meet the required performance targets. Stay current with the latest advancements in machine learning, computer vision, and high-performance computing. Qualifications: BTech/BE/MTech/ME/MS/PhD degree in CSE/IT/ECE > 2 years of experience working in Algorithm Development, Porting, Optimization & Testing Proficient in programming languages such as C/C++, CUDA, OpenCL, or other relevant languages for hardware optimization. Hands-on experience with hardware architectures, including CPUs, GPUs, DSPs, and accelerators, and familiarity with their programming models and optimization techniques. Knowledge of parallel computing, SIMD instructions, memory hierarchies, and cache optimization techniques. Experience with performance analysis tools and methodologies for profiling and optimization. Knowledge of deep learning frameworks and techniques is good to have Strong problem-solving skills and ability to work independently or within a team.

Posted 7 months ago

Apply

Exploring CUDA Jobs in India

India has emerged as a hub for tech talent, with a growing demand for professionals skilled in CUDA programming. CUDA, which stands for Compute Unified Device Architecture, is a parallel computing platform and programming model developed by NVIDIA. As more companies in India look to leverage GPU acceleration for their computing needs, the demand for CUDA developers is on the rise.

Top Hiring Locations in India

  1. Bangalore
  2. Pune
  3. Hyderabad
  4. Chennai
  5. Mumbai

Average Salary Range

The average salary range for CUDA professionals in India varies based on experience: - Entry-level: INR 4-6 lakhs per annum - Mid-level: INR 8-12 lakhs per annum - Experienced: INR 15-20 lakhs per annum

Career Path

In the field of CUDA programming, a typical career path may include: - Junior CUDA Developer - CUDA Developer - Senior CUDA Developer - CUDA Tech Lead

Related Skills

Apart from proficiency in CUDA programming, professionals in this field are often expected to have knowledge or experience in: - C/C++ programming - Parallel computing - GPU architecture - Machine learning algorithms

Interview Questions

  • What is CUDA and how does it differ from traditional programming models? (basic)
  • Explain the difference between threads and blocks in CUDA. (basic)
  • What is shared memory in CUDA and why is it important? (medium)
  • How do you optimize memory access in CUDA programming? (medium)
  • Can you explain the concept of warp divergence in CUDA? (medium)
  • What is kernel launch overhead in CUDA and how can it be minimized? (advanced)
  • How do you handle error checking in CUDA programming? (basic)
  • Explain the concept of coalesced memory access in CUDA. (medium)
  • What are the different types of memory available in CUDA? (basic)
  • How do you debug CUDA code? (medium)
  • Explain the purpose of the cudaMemcpy function in CUDA. (basic)
  • How do you handle synchronization in CUDA programming? (medium)
  • What is the significance of grid and block dimensions in CUDA? (basic)
  • Explain the concept of warp size in CUDA. (basic)
  • How do you optimize performance in CUDA kernels? (medium)
  • What is the difference between global, shared, and constant memory in CUDA? (medium)
  • Can you explain the concept of texture memory in CUDA? (medium)
  • How do you handle race conditions in CUDA programming? (medium)
  • What are the advantages of using CUDA for parallel computing? (basic)
  • Explain the concept of warp shuffle in CUDA. (advanced)
  • How do you handle dynamic memory allocation in CUDA? (basic)
  • What is the purpose of the nvcc compiler in CUDA programming? (basic)
  • How do you profile and optimize CUDA applications? (medium)
  • Can you explain the concept of occupancy in CUDA? (advanced)

Closing Remark

As the demand for CUDA professionals continues to grow in India, now is the perfect time to upskill and pursue career opportunities in this field. By mastering CUDA programming and related skills, you can position yourself as a valuable asset in the tech industry. Prepare diligently, showcase your expertise confidently, and embark on a rewarding career journey in CUDA development.

cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies