Posted:6 days ago| Platform: Foundit logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

  • Design, deploy and configure HPC Clusters including compute, storage and networking components.
  • Installation requests on HPC, application upgrades, and troubleshooting processes in coordination with users, software vendors and OEM.
  • Administer job schedulers (e.g., Slurm), manager user access, monitor health and troubleshoot system issues on both on-prem and Cloud.
  • Optimize HPC workloads, tune resource utilization and benchmark system performance.
  • Install and maintain HPC hardware, software stacks, compliers, libraries (e.g., MPI, OPENMP) and custom tools. Configure VM, Storage and servers on cloud.
  • Assist users in optimizing and running applications on the cluster & cloud, including guidance. Ensure System stability through regular updates, proactive monitoring and software/hardware troubleshooting.

Responsibilities

  • Supervise day-to-day support operations for HPC and Cloud team by supporting ticket SLA adherence.
  • Manage support ticket systems, primarily using internal IT tools.
  • Ensure timely resolution of user issues related to CAE applications in HPC & Cloud.
  • Plan, schedule, and oversee application upgrades and installations.
  • Collaborate with internal teams and external vendors to ensure seamless issue resolution.
  • Generate detailed performance reports monthly, analysing key trends and areas for improvement.

Technical Skills:

  • Operating Systems: Expertise in Linux (RHEL CentOS, Ubuntu)
  • HPC Tools and Frameworks:
  • 1. Job Schedulers: Slurm, PBS & Sync-HPC
  • 2. Parallel Programming: MPI, OPENMP, CUDA
  • 3. Scripting: Python, Bash and Optionally C/C++
  • Cloud: Knowledge in AWS, GCP & Azure with HPC toolkits, VM & Object storage creation.
  • Networking: Knowledge of high-speed networks (InfiniBand, RDMA, Ethernet)
  • Storage Systems: Experience with parallel file systems (Lustre, NFS)
  • Hardware: Familiarity with HPC specific hardware wit, RAM, CPU & GPU

Certifications

  • Any Cloud Solution Architect Certificate (Preferred GCP)
  • RHEL Certified System Administrator (Preferred)

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You

hyderabad, telangana, india

valsad, gujarat, india