HPC Linux System Administrator

4 - 8 years

4 Lacs

Posted:2 hours ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

What you'll do:
Responsibilities:
  • Must be hands-on. Be able to develop a solid understanding of the Linux system and be able to test the system.
  • Manage and maintain HPC clusters, including installation, configuration, and optimization of compute and management nodes.
  • Administer Linux/Unix-based systems, ensuring high availability, performance, and security.
  • Perform system imaging, software provisioning, and configuration management using tools such as Ansible .
  • Conduct hardware troubleshooting and coordinate with vendors or internal teams for hardware repairs and replacements.
  • Oversee lab systems used for development, testing, and release validation in HPC environments.
  • Manage storage systems (NFS, Lustre, GPFS, RAID) and ensure efficient data flow across the HPC environment.
  • Monitor system performance, perform regular health checks, and implement preventive maintenance measures.
  • Apply OS, firmware, and security updates to maintain system stability and compliance.
  • Develop and maintain automation scripts (using Bash , Python , or Ansible ) to improve operational efficiency.
  • Document system configurations, maintenance procedures, and troubleshooting guides.
  • Collaborate with cross-functional teams across geographies to resolve issues, plan upgrades, and support project activities.
  • Provides guidance and mentoring to less-experienced staff members.
What you need to bring:
Education and Experience Required:
  • Bachelors or Masters engineering degree in Computer Science, Information Systems.
  • Typically 4-8 years experience.
Knowledge and Skills:
  • Strong proficiency in Linux/Unix administration (installation, configuration, tuning, troubleshooting).
  • Experience managing HPC clusters (e.g., HPE Cray, Slurm, PBS, LSF).
  • Solid understanding of networking fundamentals (TCP/IP, DNS, DHCP, VLANs).
  • Experience with storage management systems such as NFS, Lustre, or GPFS.
  • Hands-on experience in hardware diagnostics and maintenance .
  • Familiarity with system monitoring tools such as Prometheus, Grafana, or Nagios.
  • Working knowledge of containerization (Docker, Singularity) and virtualization technologies is a plus.
  • Proficiency in shell scripting (Bash) .
  • Familiarity with Python or Ansible for automation and orchestration.
  • Ability to automate routine tasks and enhance operational efficiency.
  • Strong troubleshooting and problem-solving skills with a focus on root cause analysis.
  • Experience in maintaining accurate system documentation and change logs.
Additional Skills:
Cloud Architectures, Cross Domain Knowledge, Design Thinking, Development Fundamentals, DevOps, Distributed Computing, Microservices Fluency, Full Stack Development, Security-First Mindset, Solutions Design, Testing & Automation, User Experience (UX)

 

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
Hewlett Packard Enterprise logo
Hewlett Packard Enterprise

IT Services and IT Consulting

Houston Texas

RecommendedJobs for You