Jobs

Interviews
Job Alerts
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Home
>
Jobs in india
>
iVedha Inc.
>
HPC Engineer - AI Workloads & Infrastructure

HPC Engineer - AI Workloads & Infrastructure

Name: Jobpe
Address: T-Hub, Plot No 1/C, Sy No 83/1, Raidurgam panmaktha, Knowledge City Rd, Hyderabad, Telangana, 500081, IN
Telephone: +91-83339-09630
Price range: Free

iVedha Inc.

3 years

0 Lacs

india

Posted:3 days ago| Platform:

Apply

Skills Required

ai ml kubernetes storage data integration design scalability lustre scheduling orchestration network automate deployment configuration ansible terraform devops training tuning inference security compliance integrity engineering linux ubuntu docker cuda networking automation scripting python optimization azure aws pytorch tensorflow cutting compensation

Work Mode

On-site

Job Type

Full Time

Job Description

Job Title: HPC Engineer – AI Workloads & Infrastructure

Department: Operations – High Performance Computing (HPC)

About iVedha:

iVedha

Role Overview:

We are seeking an HPC Engineer to join our operational team supporting AI workloads in a high-performance computing environment. This role focuses on building and managing HPC compute nodes, deploying Kubernetes clusters, and orchestrating bare-metal and virtualized environments. You will also work with advanced storage technologies such as VAST Data and MooseFS, ensuring seamless integration with GPU-accelerated infrastructure.

Key Responsibilities:

Design, deploy, and maintain HPC clusters for AI/ML workloads, including GPU-accelerated compute nodes (NVIDIA DGX/HGX platforms).
Implement and manage Kubernetes for containerized AI workloads, ensuring scalability and high availability.
Configure and optimize bare-metal servers, VMs, and virtualized environments for HPC applications.
Integrate and manage high-performance storage systems (VAST, MooseFS, Lustre, or similar parallel file systems).
Implement job scheduling and orchestration using Slurm or equivalent tools for AI and HPC workloads.
Monitor and tune system performance for GPU utilization, network throughput, and storage I/O.
Automate deployment and configuration using Forman, Ansible, Terraform, or similar tools.
Collaborate with AI engineers, DevOps, and data teams to optimize infrastructure for LLM training, fine-tuning, and inference pipelines.
Ensure security, compliance, and data integrity across HPC environments.

Required Skills & Experience:

3+ years in HPC engineering, systems administration, or AI infrastructure roles.
Strong experience with Linux (RHEL/CentOS/Ubuntu) in HPC environments.
Hands-on experience with Kubernetes, Docker, and container orchestration for AI workloads.
Familiarity with GPU clusters, CUDA, NCCL and NVIDIA ecosystem tools.
Knowledge of high-speed interconnects (InfiniBand, RoCE) and networking for HPC.
Experience with parallel/distributed file systems (VAST, MooseFS, Lustre, GPFS).
Proficiency in automation and scripting (Python, Bash, Ansible).
Understanding of job schedulers (Slurm, PBS, Torque) and workload optimization.

Nice-to-Have:

Experience with cloud HPC platforms (Azure HPC, AWS ParallelCluster, or similar).
Familiarity with AI/ML frameworks (PyTorch, TensorFlow) and MLOps pipelines.
Exposure to observability tools (Prometheus, Grafana) for HPC environments.

Why Join iVedha?

Work on cutting-edge AI infrastructure projects powering Canada’s sovereign AI ecosystem.
Collaborate with a world-class team of engineers and AI specialists.
Competitive compensation, benefits, and opportunities for career growth in HPC and AI.

More Jobs at iVedha Inc.

Freelance Technical Recruiter/IT Talent Acquisition

India

3.0 - 3.0 yrs

Salary: Not disclosed

React JS Developer

India

Experience: Not specified

Salary: Not disclosed

Cloud Architect

India

Experience: Not specified

Salary: Not disclosed

Java Developer

India

Experience: Not specified

Salary: Not disclosed

DevOps/Platform Engineer

India

Experience: Not specified

Salary: Not disclosed

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

iVedha Inc.

RecommendedJobs for You

HPC Engineer - AI Workloads & Infrastructure

iVedha Inc.

india

HPC Engineer - AI Workloads & Infrastructure

iVedha Inc.

india

Login to

Please Verify Your Phone or Email

Confirm Action

HPC Engineer - AI Workloads & Infrastructure