Senior High Performance Computing Engineer

6 - 8 years

6 - 10 Lacs

Posted:1 week ago| Platform: Foundit logo

Apply

Skills Required

CI/CD pipeline Linux/Unix

Work Mode

On-site

Job Type

Full Time

Job Description

Senior High-Performance Computing (HPC) Engineer

Roles & Responsibilities

  • Infrastructure Management:

    Implement and manage cloud-based infrastructure that supports HPC environments. Ensure the security, scalability, and reliability of these systems.
  • Collaboration & Optimization:

    Work closely with data scientists and ML engineers to deploy scalable machine learning models. Optimize cloud resources for cost-effective and efficient use.
  • Automation & Monitoring:

    Develop and maintain

    CI/CD pipelines

    for deploying resources to multi-cloud environments. Monitor and troubleshoot cluster operations and cloud environments.
  • Technical Leadership:

    Provide technical leadership and guidance in cloud and HPC systems management. Document system design and operational procedures.

Qualifications

  • A Bachelor's degree in Computer Science, IT, or a related field with hands-on experience in HPC administration.
  • Expert Linux/Unix system administration

    experience (RHEL, CentOS, Ubuntu, etc.).
  • Proficiency with job scheduling and resource management tools (

    SLURM, PBS, LSF

    ).
  • Good understanding of parallel computing,

    MPI, OpenMP

    , and GPU acceleration (

    CUDA, ROCm

    ).
  • Knowledge of storage architectures and distributed file systems (

    Lustre, GPFS, Ceph

    ).
  • Expertise in scripting languages (

    Python, Bash

    ) and containerization technologies (

    Docker, Kubernetes

    ).
  • Experience with Infrastructure as Code (

    IaC

    ) tools like

    Terraform or CloudFormation

    and

    Git

    .
  • Experience in cloud computing (

    AWS, Azure, GCP

    ) and a strong understanding of cloud architecture.
  • Red Hat Certified Engineer (RHCE)

    or

    AWS Certified Solutions Architect

    certifications are preferred.

Skills & Competencies

  • Problem-Solving:

    Strong analytical and problem-solving skills, with expertise in root-cause analysis and troubleshooting.
  • Communication:

    Top-level communication and documentation skills are essential.
  • Collaboration:

    The ability to work effectively with global, virtual, and cross-functional teams in a fast-paced, cloud-first environment.
  • Technical:

    Experience with multi-cloud environments, machine learning frameworks (

    TensorFlow, PyTorch

    ), and distributed computing technologies is a plus.
  • Onsite & On-Call:

    This position is required to be onsite and involves a

    24/5 and weekend on-call rotation

    , with the possibility of working later shifts.

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You

Hyderabad / Secunderabad, Telangana, Telangana, India