Posted:3 weeks ago|
Platform:
On-site
Full Time
Job Title: HPC Network Engineer Location: Mumbai Experience: Minimum 5 years of relevant network experience Job Overview: We are seeking a highly skilled and experienced HPC Network Engineer to join our team. The ideal candidate will have a strong background in setting up and managing high-performance computing (HPC) networks with cutting-edge technologies such as 400G and 800G network connectivity. This role involves designing, implementing, and troubleshooting complex network architectures tailored to HPC and GPU-based systems. The engineer will also play a critical role in enabling efficient GPU interconnects and scaling AI and HPC workloads. Key Responsibilities: HPC Network Deployment : Design, deploy, and maintain HPC networks with 400G/800G connectivity. Optimize network performance for large-scale computing environments. Advanced Networking Expertise : Deep understanding and hands-on experience with RoCE (RDMA over Converged Ethernet) and Infiniband technologies. Collaborate with cross-functional teams to architect and implement robust HPC networking solutions. Architectural Design and Communication : Develop and present complex network architectures to technical and non-technical stakeholders. Translate customer requirements into scalable and efficient network designs. GPU Communication Frameworks : Expertise in NVLink and NVSwitch for high-speed GPU-to-GPU communication. Optimize interconnects for distributed training and inference workloads. Technology Expertise : Hands-on experience with switches and networking equipment from Broadcom, Arista, Mellanox, Juniper, Cisco, SONiC, or Dell . Familiarity with NVIDIA, AMD, and Intel HPC architectures and their network integration requirements. Storage Networking for HPC and AI : Integrate GPUDirect Storage and NVMe-oF for efficient data movement between storage and GPUs. Optimize data pathways for high-speed storage access in HPC workloads. Problem Solving and Troubleshooting : Monitor, analyze, and troubleshoot network performance issues. Implement monitoring tools to ensure high availability and reliability of the HPC network. Customer-Centric Solutions : Engage with customers to understand their requirements and deliver tailored solutions on the fly. Provide ongoing support and documentation for implemented solutions. Comprehensive Network Knowledge : Expertise in end-to-end network monitoring, analysis, troubleshooting, and implementation. Stay updated on industry trends, standards, and best practices for HPC networking. AI and HPC Workload Integration : Support hybrid workloads combining AI and traditional HPC tasks. Scale large language models and scientific simulations across GPU clusters with minimal latency. Required Skills and Qualifications: Minimum 5 years of hands-on experience in core network engineering. Proven expertise in configuring and managing 400G or 800G network environments. Strong knowledge of RoCE and Infiniband protocols. Hands-on experience with NVLink and NVSwitch in GPU-based environments. Familiarity with networking equipment and technologies from vendors such as Broadcom, Arista, Mellanox, Juniper, Cisco, SONiC, or Dell . Experience working with NVIDIA, AMD, or Intel HPC and GPU architectures . Ability to conceptualize and explain complex network designs to diverse audiences. Strong analytical and troubleshooting skills in high-performance environments. Excellent communication and customer engagement skills to address requirements and provide solutions effectively. Preferred Qualifications: Industry certifications such as CCNP, CCIE, or equivalent. Experience in scripting and automation for network operations. Exposure to large-scale HPC deployments in data center environments. Knowledge of software-defined networking (SDN) and virtualized networking environments. Familiarity with AI-specific frameworks like TensorFlow , PyTorch , or Horovod in distributed setups. Why Join Us? Work on cutting-edge HPC and GPU-based technologies. Collaborate with industry leaders in AI and cloud infrastructure. Competitive compensation and growth opportunities. Opportunity to work in a dynamic and fast-paced environment. Show more Show less
Stealth AI Startup
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Gurgaon
7.0 - 9.0 Lacs P.A.
Gurgaon
20.0 - 22.0 Lacs P.A.
Mumbai, Maharashtra, India
Experience: Not specified
Salary: Not disclosed
Gurugram
4.0 - 8.0 Lacs P.A.
Mumbai, Maharashtra, India
Experience: Not specified
Salary: Not disclosed