Home
Jobs

Lead Solutions Architect AI Infrastructure & Private Cloud

11 - 16 years

30 - 35 Lacs

Posted:1 day ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Weare seeking an experienced Lead Solutions Architect with deepexpertise in AI/ML infrastructure , High PerformanceComputing (HPC) , and container platforms to join ourdynamic team focused on delivering HPE Private Cloud AI and EnterpriseAI Factory Solutions . This role is instrumental in architecting, deploying,and optimizing private cloud environments that leverage HPEco-developed solutionswith NVIDIA, as we'll as validated HPE reference architectures, to supportenterprise-grade AI workloads at scale.

Theideal candidate will bring strong technical expertise in AIinfrastructure, container orchestration platforms, and hybrid cloudenvironments, and will play a key role in delivering scalable, secure, andhigh-performance AI platform solutions powe'red by HPEGreenLake and NVIDIA AI Enterprise technologies.

KeyResponsibilities:

Leadership and Strategy:

Provide delivery assurance and serve as the lead design authority to ensure seamless execution of Enterprise grade container platform including Red Hat OpenShift and SUSE Rancher, HPE Private Cloud AI and HPC/AI solutions, fully aligned with customer AI/ML strategies and business objectives.

Align solution architecture with NVIDIA Enterprise AI Factory design principles, including modular scalability, GPU optimization, and hybrid cloud orchestration.

Oversee planning, risk management, and stakeholder alignment throughout the project lifecycle to ensure successful outcomes.

Solution Planning and Design:

Architect and optimize end-to-end solutions across container orchestration and HPC workload management domains, leveraging platforms such as Red Hat OpenShift, SUSE Rancher, and/or workload schedulers like Slurm and Altair PBS Pro.

Ensure seamless integration of container and AI platforms with the broader software ecosystem, including NVIDIA AI Enterprise, as we'll as open-source DevOps, AI/ML tools, and frameworks.

Opportunity assessment:

Lead technical responses to RFPs, RFIs, and customer inquiries, ensuring alignment with business and technical requirements.

Conduct proof-of-concept (PoC) engagements to validate solution feasibility, performance, and integration within customer environments.

Assess customer infrastructure and workloads to recommend optimal configurations using validated reference architectures from HPE and strategic partners such as Red Hat, NVIDIA, SUSE, along with components from the open-source ecosystem.

Innovation and Research:

Stay current with emerging technologies, industry trends, and best practices across HPC, Kubernetes, container platforms, hybrid cloud, and security to inform solution design and innovation.

Customer-centric mindset:

Act as a trusted advisor to enterprise customers, ensuring alignment of AI solutions with business goals.

Translate complex technical concepts into value propositions for stakeholders

6. TeamCollaboration:

Collaboratewith cross-functional teams, including subject matter experts in infrastructurecomponentssuch as HPE servers, storage, networkingand data science teams toensure cohesive and integrated solution delivery.

Mentor technicalconsultants and contribute to internal knowledge sharing through tech talks andinnovation forums.

Required Skills:

1. HPC& AI Infrastructure

Extensive knowledge of HPCtechnologies and workload scheduler such as Slurm and/or AltairPBS Pro,

Proficient in HPC clustermanagement tools, including HPE Cluster Management (HPCM) and/or NVIDIA BaseCommand Manager.

Experience with HPC clustermanagers like HPE Cluster Management (HPCM) and/or NVIDIA Base Command Manager.

Good understanding withhigh-speed networking stacks (InfiniBand, Mellanox) and performance tuning ofHPC components.

Solid grasp of high-speednetworking technologies, such as InfiniBand and Ethernet.

2. Containerization& Orchestration

Extensive hands-on experiencewith containerization technologies such as Docker, Podman, and Singularity

Proficiency with at least twocontainer orchestration platforms: CNCF Kubernetes, Red Hat OpenShift, SUSERancher (RKE/K3S), Canonical Charmed Kubernetes.

Strong understanding of GPUtechnologies, including the NVIDIA GPU Operator for Kubernetes-basedenvironments and DCGM (Data Center GPU Manager) for GPU health and performancemonitoring.

3.OperatingSystems & Virtualization

Extensive experience in Linuxsystem administration, including package management, boot processtroubleshooting, performance tuning, and network configuration.

Proficient with multipleLinux distributions, with hands-on expertise in at least two of the following:RHEL, SLES, and Ubuntu.

Experience withvirtualization technologies, including KVM and OpenShift Virtualization, fordeploying and managing virtualized workloads in hybrid cloud environments.

4. Cloud, DevOps & MLOps

Solid understanding of hybridcloud architectures and experience working with major cloud platforms inconjunction with on-premises infrastructure.

Familiarity with DevOpspractices, including CI/CD pipelines, infrastructure as code (IaC), andmicroservices-based application delivery.

Experience integrating andoperationalizing open-source AI/ML tools and frameworks, supporting the fullmodel lifecycle from development to deployment.

Good understanding ofcloud-native security, observability, and compliance frameworks, ensuringsecure and reliable AI/ML operations at scale.

5. Networking & Protocols

Strong understanding of corenetworking principles, including DNS, TCP/IP, routing, and load balancing,essential for designing resilient and scalable infrastructure.

Working knowledge of keynetwork protocols, such as S3, NFS, and SMB/CIFS, for data access, transfer,and integration across hybrid environments.

6. Programming &Automation

Proficiency in scripting orprogramming languages such as Python and Bash.

Experience automatinginfrastructure and AI workflows.

7. Soft Skills &Leadership

Excellent problem-solving,analytical thinking, and communication skills for engaging both technical andnon-technical stakeholders.

Proven ability to leadcomplex technical projects from requirements gathering through architecture,design, and delivery.

Strong business acumen withthe ability to align technical solutions with client challenges and objectives.

Qualifications:

Bachelor/masterdegree in computer science, Information Technology, or a related field.

Professional certifications in AI Infrastructure, Containers and Kubernetes are highly desirable such as RHCSA, RHCE, CNCF certifications (CKA, CKAD, CKS), NVIDIA-Certified Associate - AI Infrastructure and Operations

Typically, 810 years of hands-on experience in architecting and implementing HPC, AI/ML, and container platform solutions within hybrid or private cloud environments, with a strong focus on scalability, performance, and enterprise integration.

Mock Interview

Practice Video Interview with JobPe AI

Start Solution Architecture Interview Now
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
Algoleap Technologies
Algoleap Technologies

Information Technology

San Francisco

50-100 Employees

57 Jobs

    Key People

  • John Doe

    CEO
  • Jane Smith

    CTO

RecommendedJobs for You