Domain:
Information Technology (IT),821
Industry:
Product-Based Technology Exp is mandate
About The Role
We are seeking an experienced HPC DevOps Engineer with strong expertise in high-performance computing (HPC) systems, Linux, and cluster management. This role involves end-to-end ownership of HPC design, implementation, and support, with a focus on scalability, reliability, and performance optimization. Candidates must have
solid product company experience
and demonstrate strong technical leadership in HPC and cloud-based computing environments.
Key Responsibilities
- Design, implement, and support high-performance compute (HPC) clusters.
- Work with CPU/GPU architectures, scalable storage, high-bandwidth interconnects, and cloud-based computing architectures.
- Generate hardware BOMs, manage vendors, and oversee hardware release activities.
- Configure Linux operating systems for HPC environments.
- Translate project specifications and performance requirements into subsystem/system designs.
- Ensure project deliverables are met on time and with high quality.
- Support design and release of new products, including golden images, procedures, scripts, and documentation for manufacturing and support teams.
Required Qualifications
- Minimum 7 years of experience in:
- HPC systems and clusters
- Linux systems (SuSE, RedHat, Rocky, Ubuntu)
- HPC hardware (servers, GPUs, networking, storage, BIOS, BMC)
- TCP/IP fundamentals and protocols (DNS, DHCP, HTTP, LDAP, SMTP)
- Strong scripting skills in Shell and Python.
- Hands-on experience in System-D, Net boot/PXE, Linux HA.
- Configuration management experience (Salt, Chef, Puppet, etc.).
- Education: BE/BTech/MCA/MSc in Computer Engineering, Electrical Engineering, or related fields (Diploma / 3-year degree holders not eligible).
- Must have good stability in previous roles (minimum 2 years per organization).
- Product company experience is mandatory.
Preferred Qualifications
- DevOps exposure with CI/CD pipelines (Jenkins), Git-based repositories, Singularity & Docker containers.
- Experience with Kubernetes, Prometheus, Grafana.
- Web/application server knowledge: Apache/Nginx, load balancing (HA Proxy), proxy/reverse proxy setup.
- Strong understanding of storage systems and robust architectures.
Skills & Abilities
- Strong interpersonal and teamwork skills with the ability to collaborate across levels.
- Excellent organizational, time management, and multitasking capabilities.
- Adaptability in dynamic, fast-paced environments.
- Excellent verbal and written communication skills.
Additional Information
- Notice Period: Immediate to 60 days
- Interview Process: 3 technical rounds + 1 HR discussion
✅ This role is best suited for
HPC/Cloud Engineers with proven product company experience
.❌ Only DevOps-focused candidates will not be a fit.Skills: cloud,proxy,linux,computing,devops