Product Manager - AI Data Center Infrastructure

5 - 10 years

30 - 37 Lacs

Posted:11 hours ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Product Manager - AI Data Center Infrastructure
This role has been designed as Onsite with an expectation that you will primarily work from an HPE office.
:
J ob Family Definition:
We are seeking a Product Line Manager (PLM) for AI Data Center Infrastructure to define and deliver next-generation data center networking platforms for large-scale GPU clusters. This role is ideal for a visionary, hands-on leader who understands how AI workloads stress networks at scale and can translate that insight into clear product requirements and roadmaps.
The successful candidate will have deep experience with data center switching platforms, high-performance Ethernet fabrics, and GPU/NIC interconnects across NVIDIA and AMD ecosystems. You will drive the architecture and product strategy for scale-up and scale-out AI fabrics, enabling deterministic performance, ultra-low latency, and operational excellence for hyperscale AI training and inference clusters.
This role requires a self-starter and go-getter who can operate independently while collaborating across engineering, operations, and strategic partners.
What you will do:
AI Data Center Fabric Architecture
  • Define product requirements for AI data center network architectures supporting thousands of GPUs.
  • Develop requirements for low-latency Ethernet fabrics using Juniper QFX platforms and Apstra-based automation.
  • Enable high-bandwidth GPU and NIC interconnects optimized for large-scale distributed training and inference workloads.
GPU, NIC Interconnect Strategy
  • Lead requirements definition for next-generation GPUs, NICs, and interconnect technologies, staying ahead of industry roadmaps.
  • Drive alignment with: NVIDIA: ConnectX (CX7/CX8), NVLink, NVSwitch, AMD: MI300/MI400 platforms, Pollara NICs, Infinity Fabric
  • Ensure interoperability across DAC, AEC, ACC, and optical transceivers between switches and NIC endpoints.
  • Define scale-up paths using PCIe, NVLink, NVSwitch, ensuring GPU-to-GPU symmetry, consistency, and bandwidth determinism.
Switching, Routing Telemetry
  • Specify and optimize L2/L3 architectures, including EVPN-VXLAN, Class-E IPv4, and AI-optimized buffer tuning.
  • Leverage hardware telemetry, streaming sensors, and analytics for proactive performance assurance.
  • Drive automation using Python, Ansible, Apstra, Terraform, and related tools to enforce configuration consistency and compliance.
Performance Optimization Troubleshooting
  • Analyze GPU job performance to identify network hotspots, congestion, packet loss, and microbursts.
  • Tune ECN, RDMA/ROCEv2, PFC, and traffic-engineering policies for AI workloads.
  • Optimize server-to-switch interactions, including: BIOS and firmware alignment, NIC queue and link-training parameters, Cable selection and management (AEC/ACC/optics)
Cross-Functional Ecosystem Collaboration
  • Partner closely with AI platform teams, GPU system architects, data center operations, and strategic vendors (NVIDIA, AMD, Juniper).
  • Lead and participate in root-cause analysis for: Link flaps and training failures, FEC and PCS errors, Thermal or power-related performance degradation
  • Drive lab validation, scale testing, and certification of new optics, NIC firmware, and switch software releases.
What you need to bring:
  • 5-10+ years of experience in data center networking, AI infrastructure, or HPC environments.
  • Strong hands-on experience with Juniper QFX platforms and JunOS.
  • Deep understanding of GPU architectures:
    • NVIDIA: H100/H200, GB200/GB300, NVLink/NVSwitch
    • AMD: MI300/MI400, Pollara NICs, Infinity Fabric
  • Proven expertise in scale-up GPU interconnects and scale-out Ethernet fabrics.
  • Strong knowledge of RDMA/ROCEv2, ECN, PFC, and buffer management.
  • Familiarity with distributed AI workloads, collective operations (NCCL, RCCL).
  • Hands-on troubleshooting experience with high-speed optics, AEC cables, link training, and NIC firmware.
  • Proficiency in automation and scripting (Python, Ansible, Bash, Terraform).
Preferred Qualifications
  • Certification : JNCIE , CCIE, (NCP-AII), (NCA-AIIO), (NCP-AIO), (NCP-AIN)
  • Experience with Apstra or other intent-based networking platforms.
  • Knowledge of 1.6T optics, 200G PAM4 SerDes, and CPO/LPO architectures.
  • Experience supporting liquid-cooled GPU clusters and rack-level power/network design.
  • Understanding of data center operations, observability, and SLAs for AI training and inference clusters.
Additional Skills:
Cross Domain Knowledge, Customer Engagement, Design Thinking, Development Fundamentals, DevOps, Go-to-Market Expertise, Partner Management, Product Lifecycle Management, Security-First Mindset, Strategic Pricing, Strategy Creation, User Experience (UX), Value Creation, Vendor Management
Job:
Engineering
Job Level:
TCP_04
HPE is an Equal Employment Opportunity/ Veterans/Disabled/LGBT employer. We do not discriminate on the basis of race, gender, or any other protected category, and all decisions we make are made on the basis of qualifications, merit, and business need. Our goal is to be one global team that is representative of our customers, in an inclusive environment where we can continue to innovate and grow together. Please click here: Equal Employment Opportunity .

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
Hewlett Packard Enterprise logo
Hewlett Packard Enterprise

IT Services and IT Consulting

Houston Texas

RecommendedJobs for You

hyderabad, chennai, bengaluru