HPC System Engineer

0 years

0 Lacs

Posted:2 months ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

We are seeking an experienced HPC (High-Performance Computing) System Engineer to design,

implement, and manage cutting-edge HPC infrastructure using Dell servers, AMD GPUs (MI210),

and Pure Storage systems. The ideal candidate will have expertise in Commvault backup

systems, Kubernetes container orchestration, and multitenancy configurations, ensuring

scalable, GPU-accelerated, and high-performance solutions tailored to enterprise and HPC

workloads.


Key Responsibilities:

 Dell Servers:

 Architect and deploy HPC systems using Dell PowerEdge servers, ensuring high availability

and optimized performance for compute-intensive applications.

 Manage server hardware lifecycle, including deployment, upgrades, and diagnostics.

 Configure HPC cluster nodes for seamless integration with Kubernetes and GPU workloads.

 AMD GPUs (MI210):

 Deploy and optimize AMD GPU-based servers to accelerate AI/ML, HPC, and data-intensive

applications.

 Monitor GPU utilization, troubleshoot performance bottlenecks, and optimize workloads for

GPU acceleration.

 Integrate GPUs into Kubernetes environments for containerized GPU-based applications.

Pure Storage:

 Design and manage Pure Storage solutions, including FlashBlade, to support HPC and

data-intensive workloads.

 Implement multitenancy configurations for isolated, secure, and efficient resource

utilization.

 Monitor storage health and ensure performance optimization for high-speed data access.


 Commvault Backup:

 Architect and manage enterprise-wide Commvault backup solutions, ensuring data integrity

and readiness for disaster recovery.

 Implement backup and retention policies for HPC environments, including containerized and

GPU-accelerated workloads.

Kubernetes Container Management:

 Deploy and manage Kubernetes clusters for HPC applications, ensuring scalability and fault

tolerance.

 Configure persistent storage for containerized workloads and integrate storage with GPUs for

high-performance data processing.

 Monitor cluster performance and troubleshoot HPC-specific Kubernetes challenges.

 System Optimization and Monitoring:

 Implement advanced monitoring solutions for servers, GPUs, storage, and Kubernetes

clusters to ensure peak performance.

 Develop and enforce policies for system security, resource allocation, and compliance with

industry standards.

 Lead capacity planning and scaling initiatives for HPC infrastructure.

Team Leadership and Collaboration:

 Mentor and guide junior engineers on HPC best practices, system design, and

troubleshooting techniques.

 Collaborate with cross-functional teams, including data scientists and DevOps, to align

infrastructure capabilities with organizational goals.


Qualifications:

 Technical Skills:

 Extensive experience with Dell PowerEdge servers in HPC or enterprise environments.

 Proven expertise in AMD GPUs (MI210), including their integration and optimization for AI/ML

and HPC workloads.

 Advanced knowledge of Pure Storage systems, including multitenancy and high-

performance configurations.

 Expertise in Commvault backup systems, including design, deployment, and disaster

recovery.

 Strong proficiency in Kubernetes container orchestration, particularly for GPU-accelerated

applications.

 Knowledge of high-performance interconnects (e.g., RDMA, InfiniBand) and networking for

HPC.

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You