HPC Infrastructure Engineer

2 - 5 years

2 - 5 Lacs

Posted:12 hours ago| Platform: Foundit logo

Apply

Skills Required

Work Mode

On-site

Job Type

Full Time

Job Description

Roles & Responsibilities

  • Provide enterprise-level operational support to Managed Services customers for incident, problem, and change management activities
  • Plan and perform maintenance activities
  • Assess customer environments for performance and design issues and propose resolutions
  • Work across technical teams to troubleshoot complex infrastructure issues
  • Create and maintain detailed documentation
  • Serve as a subject matter expert and escalation point for storage technologies
  • Work with vendors to resolve storage issues
  • Communicate with customers and internal team with transparency
  • Participate in on-call rotation
  • Completion of training and certification as assigned to further skills and knowledge

Skills Required

  • Bachelor's degree or equivalent in Information Systems or related field
  • 5+ years of expert-level experience managing infrastructure in high-performance computing environments
  • 1+ years of experience with Nvidia DGX preferred
  • Experience with HPC schedulers (e.g., SLURM, PBS, Torque)
  • Experience configuring, maintaining, and troubleshooting Kubernetes
  • Experience with storage technology (e.g., Ceph, Vast Data Platform) and distributed file systems (e.g., Lustre, GPFS, NFS, GlusterFS)
  • Experience with machine learning or data science workflows in HPC/AI environments
  • Advanced experience with Linux operating systems
  • Experience with Nvidia/Mellanox (Cumulus OS) switches a plus
  • Experience with ethernet and InfiniBand networking a plus
  • 1+ years working with monitoring platforms (e.g., Prometheus, Grafana); Elastic Observability experience is a bonus
  • 1+ years working with enterprise ITSM systems (ServiceNow is a bonus)
  • Experience with automation tools such as Ansible, Puppet, or Chef is a plus
  • Managed Services or consulting experience is required
  • Strong background in customer service
  • High-level problem-solving and communication skills
  • Strong oral and written communication skills
  • Related network certifications are a bonus

Why AHEAD

  • Diversity-focused workplace with initiatives like Moving Women AHEAD and RISE AHEAD
  • Multi-million-dollar lab and cross-department training
  • Sponsorship for certifications and ongoing learning

USA Employment Benefits Include

  • Medical, Dental, and Vision Insurance
  • 401(k)
  • Paid company holidays
  • Paid time off
  • Paid parental and caregiver leave
  • Additional benefits listed at https://www.aheadbenefits.com/

Note: The OTE range includes base salary and target bonus and may vary by experience and location.

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You

coimbatore, tamil nadu, india