Lead Solutions Architect - AI Infrastructure & Private Cloud

8 - 13 years

30 - 40 Lacs

Posted:21 hours ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Required Skills:

  • HPC & AI Infrastructure
  • Extensive knowledge of HPC technologies and workload scheduler such as Slurm and/or Altair PBS Pro,
  • Proficient in HPC cluster management tools, including HPE Cluster Management (HPCM) and/or NVIDIA Base Command Manager.
  • Experience with HPC cluster managers like HPE Cluster Management (HPCM) and/or NVIDIA Base Command Manager.
  • Good understanding with high-speed networking stacks (InfiniBand, Mellanox) and performance tuning of HPC components.
  • Solid grasp of high-speed networking technologies, such as InfiniBand and Ethernet.

Containerization & Orchestration

  • Extensive hands-on experience with containerization technologies such as Docker, Podman, and Singularity
  • Proficiency with at least two container orchestration platforms: CNCF Kubernetes, Red Hat OpenShift, SUSE Rancher (RKE/K3S), Canonical Charmed Kubernetes.
  • Strong understanding of GPU technologies, including the NVIDIA GPU Operator for Kubernetes-based environments and DCGM (Data Center GPU Manager) for GPU health and performance monitoring.

Operating Systems & Virtualization

  • Extensive experience in Linux system administration, including package management, boot process troubleshooting, performance tuning, and network configuration.
  • Proficient with multiple Linux distributions, with hands-on expertise in at least two of the following: RHEL, SLES, and Ubuntu.
  • Experience with virtualization technologies, including KVM and OpenShift Virtualization, for deploying and managing virtualized workloads in hybrid cloud environments.

Cloud, DevOps & MLOps

  • Solid understanding of hybrid cloud architectures and experience working with major cloud platforms in conjunction with on-premises infrastructure.
  • Familiarity with DevOps practices, including CI/CD pipelines, infrastructure as code (IaC), and microservices-based application delivery.
  • Experience integrating and operationalizing open-source AI/ML tools and frameworks, supporting the full model lifecycle from development to deployment.
  • Good understanding of cloud-native security, observability, and compliance frameworks, ensuring secure and reliable AI/ML operations at scale.

Networking & Protocols

  • Strong understanding of core networking principles, including DNS, TCP/IP, routing, and load balancing, essential for designing resilient and scalable infrastructure.

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You