The High-Performance Computing Infrastructure Engineer is primarily responsible for the overall health and maintenance of storage technologies in our managed services customer's environments. Our HPC Infrastructure Engineers are a valued member of the Managed Services Infrastructure Practice responsible for Tier 3 incident management, service request management and change management infrastructure support for all Managed Services customers. Roles & Responsibilities Provide enterprise-level operational support to Managed Services customers for incident, problem, and change management activities Plan and perform maintenance activities Assess customer environments for performance and design issues and propose resolutions Work across technical teams to troubleshoot complex infrastructure issues Create and maintain detailed documentation Serve as a subject matter expert and escalation point for storage technologies Work with vendors to resolve storage issues Communicate with customers and internal team with transparency Participate in on-call rotation Completion of training and certification as assigned to further skills and knowledge Skills Required Bachelors degree or equivalent Information Systems or related field. Unique education, specialized experience, skills, knowledge, training, or certification may be substituted for education 5+ years of expert level experience managing infrastructure in high-performance computing environments including configuration, troubleshooting, and best practice. 1+ years of experience with Nvidia DGX preferred. Experience with high-performance computing (HPC) schedulers (e.g., SLURM, PBS, Torque) required. Experience configuring, maintaining and troubleshooting Kubernetes. Experience with storage technology (e.g., Ceph, Vast Data Platform) and distributed file systems (e.g., Lustre, GPFS, NFS, GlusterFS). Experience with machine learning or data science workflows in HPC/AI environments Advances experience with Linux operating systems. Experience configuring, maintaining and troubleshooting Nvidia/Mellanox (Cumulus OS) switches a plus Experience with both ethernet and InfiniBand networking a plus. 1+ years working with monitoring platforms (e.g., Prometheus, Grafana); Elastic Observability experience is a bonus 1+ years working with an enterprise ITSM system: Service Now is a bonus Previous experience with automation tools such as Ansible, Puppet, or Chef a plus. Managed Services or consulting experience is required. Strong background with customer service High level problem-solving and communication skills Strong oral and written communications skills Related network certifications are a bonus.

Ahead

www.ahead.be

Human Resources Services

Brussels

Login to

Please Verify Your Phone or Email

Confirm Action

Search

Profile

Bookmarks

HPC Infrastructure Engineer

Experience & Salary

Skills Required

Work Mode

Job Type

Job Description

RecommendedJobs for You

S&C GN - Tech Strategy & Advisory - Data Strategy - Manager

Engagement Manager

Principal AI Engineer

Java Developer

WalkIn Drive For PUMA Sports in Bangalore on 26th March

Hiring For Internation Voice Process- Bangalore

Trainee - Opportunity To Cash

Learning Designer

Associate Property Delivery Specialist

Sr Representative ID & Profile Management (TCF)