HPC INFRASTRUCTURE ENGINEER Mumbai,Maharashtra,India 5 years None Not disclosed On-site Full Time

We’re Hiring: HPC Infrastructure Engineer 📍 Location:INDIA (CANDIDATE MUST BE COMFORTABLE TO RELOCATE TO UAE) 🕒 Experience: 5+ Years 💼 Employment Type: Full-Time ⸻ 🔧 Job Summary: We are seeking a highly skilled High-Performance Computing (HPC) Infrastructure Engineer to join our IT infrastructure team. This role focuses on designing, deploying, and maintaining robust HPC systems that support advanced computing and data-intensive applications. You will play a key role in ensuring the performance, reliability, and scalability of compute and storage infrastructure. The role includes managing incident response, service requests, and changes across HPC environments in managed service settings. ⸻ 🛠️ Roles and Responsibilities: • Design, implement, and manage high-performance network architectures for HPC clusters. • Configure and optimize InfiniBand and Ethernet switches, routers, and interconnects. • Ensure high availability, redundancy, and fault tolerance in HPC systems. • Deploy and maintain HPC clusters, monitor job scheduling, and ensure optimal system health. • Troubleshoot compute node hardware/software issues and implement performance improvements. • Maintain storage systems (Ceph, Vast Data, Lustre, GPFS, NFS, GlusterFS) with fast, reliable access from clusters. • Configure and manage InfiniBand fabrics; upgrade firmware and monitor performance. • Use tools like Grafana, Prometheus, Ganglia, and UFM for cluster and network monitoring. • Work closely with researchers and data scientists to support HPC/AI workloads. • Assist in debugging, tuning, and optimizing distributed applications. • Create and maintain HLD and LLD documentation. ⸻ 📚 Required Experience: • 5+ years managing infrastructure in HPC environments. • Strong background in data center operations – servers, switches, routers, storage. • Proficient in NVIDIA/Mellanox (Cumulus OS) switch configuration and troubleshooting. • Hands-on with monitoring tools: Prometheus, Grafana, Elastic Observability. • Experience with HPC schedulers: SLURM, PBS, or Torque. • Kubernetes environment setup and maintenance experience. • Familiar with ML and data science workflows in HPC/AI environments. • Strong Linux administration experience. ⸻ 💡 Skills & Knowledge: • Deep understanding of Ethernet and InfiniBand networks. • Proficiency in distributed storage and file systems. • Expertise in diagnosing and resolving complex infrastructure issues. • Collaborative team player with strong communication skills. • Capable of documenting and designing complex systems architecture. ⸻ 🎓 Qualifications: • Bachelor’s or Master’s degree in Computer Science, IT, or equivalent experience. ⸻ 📜 Certifications (Preferred): • Red Hat Certified Engineer (RHCE) • Cisco Certified Network Associate (CCNA) • AWS Certified Solutions Architect

Data Centre Network Engineer maharashtra 8 - 12 years INR Not disclosed On-site Full Time

You are invited to join our team as a Data Center Network Engineer with a focus on NVIDIA Switch expertise. As a senior professional with over 8 years of experience, you will play a crucial role in designing, deploying, maintaining, and supporting cutting-edge data center networks. Your responsibilities will primarily involve working on high-performance infrastructure for AI, enterprise, and cloud workloads, utilizing NVIDIA Ethernet switches (Cumulus OS) along with modern networking automation tools. Your key responsibilities will include designing, implementing, and troubleshooting data center networks using NVIDIA/Cumulus OS switches. You will provide expert-level guidance on Ethernet networking, collaborate with cross-functional teams for various projects, and lead or support network automation using tools such as Ansible, Puppet, or Python. Additionally, you will be expected to handle complex Layer 2/3 network issues, ensure network performance, reliability, and security, as well as assist pre-sales teams with technical support and solution design. The ideal candidate for this role should possess a minimum of 8 years of experience in data center networking within enterprise-grade environments. Proficiency in NVIDIA/Cumulus OS Ethernet switch configuration and troubleshooting is essential, along with a deep understanding of TCP/IP, routing protocols, and data center network architecture. Hands-on experience with network automation tools and Python scripting, expertise in VXLAN, OTV, EVPN, Layer 2/3 protocols, QoS, and modern monitoring tools are also required. Familiarity with SAN/Fiber Channel and storage networking, proven leadership skills, and effective cross-functional communication abilities are highly valued. In terms of education and certifications, a Bachelor's degree in Computer Science, Engineering, IT, or a related field is necessary. A CCIE Data Center certification or equivalent (JNCIE, ACE-E) is mandatory for this role, with additional certifications in Linux or InfiniBand networking being preferred. Continuous experience of a minimum of 3 years within a single organization is also a requirement. If you are passionate about network engineering, have a strong background in data center networking, and possess the desired expertise in NVIDIA Switches and related technologies, we encourage you to apply for this challenging and rewarding opportunity. Join us in shaping the future of high-performance data center networks and making a significant impact in the field of AI, enterprise, and cloud computing.,

Login to

Please Verify Your Phone or Email

Confirm Action

Meridian Placements Services Private Limited

Before You Leave... Find Your Perfect Job!

Login to

Please Verify Your Phone or Email

Confirm Action

Contact Us

Meridian Placements Services Private Limited