11 - 16 years
30 - 35 Lacs
Posted:1 day ago|
Platform:
Work from Office
Full Time
Leadership and Strategy:
Provide delivery assurance and serve as the lead design authority to ensure seamless execution of Enterprise grade container platform including Red Hat OpenShift and SUSE Rancher, HPE Private Cloud AI and HPC/AI solutions, fully aligned with customer AI/ML strategies and business objectives.
Align solution architecture with NVIDIA Enterprise AI Factory design principles, including modular scalability, GPU optimization, and hybrid cloud orchestration.
Oversee planning, risk management, and stakeholder alignment throughout the project lifecycle to ensure successful outcomes.
Solution Planning and Design:
Architect and optimize end-to-end solutions across container orchestration and HPC workload management domains, leveraging platforms such as Red Hat OpenShift, SUSE Rancher, and/or workload schedulers like Slurm and Altair PBS Pro.
Ensure seamless integration of container and AI platforms with the broader software ecosystem, including NVIDIA AI Enterprise, as we'll as open-source DevOps, AI/ML tools, and frameworks.
Opportunity assessment:
Lead technical responses to RFPs, RFIs, and customer inquiries, ensuring alignment with business and technical requirements.
Conduct proof-of-concept (PoC) engagements to validate solution feasibility, performance, and integration within customer environments.
Assess customer infrastructure and workloads to recommend optimal configurations using validated reference architectures from HPE and strategic partners such as Red Hat, NVIDIA, SUSE, along with components from the open-source ecosystem.
Innovation and Research:
Stay current with emerging technologies, industry trends, and best practices across HPC, Kubernetes, container platforms, hybrid cloud, and security to inform solution design and innovation.
Customer-centric mindset:
Act as a trusted advisor to enterprise customers, ensuring alignment of AI solutions with business goals.
Translate complex technical concepts into value propositions for stakeholders
6. TeamCollaboration:
Collaboratewith cross-functional teams, including subject matter experts in infrastructurecomponentssuch as HPE servers, storage, networkingand data science teams toensure cohesive and integrated solution delivery.
Mentor technicalconsultants and contribute to internal knowledge sharing through tech talks andinnovation forums.
Required Skills:
1. HPC& AI Infrastructure
Extensive knowledge of HPCtechnologies and workload scheduler such as Slurm and/or AltairPBS Pro,
Proficient in HPC clustermanagement tools, including HPE Cluster Management (HPCM) and/or NVIDIA BaseCommand Manager.
Experience with HPC clustermanagers like HPE Cluster Management (HPCM) and/or NVIDIA Base Command Manager.
Good understanding withhigh-speed networking stacks (InfiniBand, Mellanox) and performance tuning ofHPC components.
Solid grasp of high-speednetworking technologies, such as InfiniBand and Ethernet.
2. Containerization& Orchestration
Extensive hands-on experiencewith containerization technologies such as Docker, Podman, and Singularity
Proficiency with at least twocontainer orchestration platforms: CNCF Kubernetes, Red Hat OpenShift, SUSERancher (RKE/K3S), Canonical Charmed Kubernetes.
Strong understanding of GPUtechnologies, including the NVIDIA GPU Operator for Kubernetes-basedenvironments and DCGM (Data Center GPU Manager) for GPU health and performancemonitoring.
3.OperatingSystems & Virtualization
Extensive experience in Linuxsystem administration, including package management, boot processtroubleshooting, performance tuning, and network configuration.
Proficient with multipleLinux distributions, with hands-on expertise in at least two of the following:RHEL, SLES, and Ubuntu.
Experience withvirtualization technologies, including KVM and OpenShift Virtualization, fordeploying and managing virtualized workloads in hybrid cloud environments.
4. Cloud, DevOps & MLOps
Solid understanding of hybridcloud architectures and experience working with major cloud platforms inconjunction with on-premises infrastructure.
Familiarity with DevOpspractices, including CI/CD pipelines, infrastructure as code (IaC), andmicroservices-based application delivery.
Experience integrating andoperationalizing open-source AI/ML tools and frameworks, supporting the fullmodel lifecycle from development to deployment.
Good understanding ofcloud-native security, observability, and compliance frameworks, ensuringsecure and reliable AI/ML operations at scale.
5. Networking & Protocols
Strong understanding of corenetworking principles, including DNS, TCP/IP, routing, and load balancing,essential for designing resilient and scalable infrastructure.
Working knowledge of keynetwork protocols, such as S3, NFS, and SMB/CIFS, for data access, transfer,and integration across hybrid environments.
6. Programming &Automation
Proficiency in scripting orprogramming languages such as Python and Bash.
Experience automatinginfrastructure and AI workflows.
7. Soft Skills &Leadership
Excellent problem-solving,analytical thinking, and communication skills for engaging both technical andnon-technical stakeholders.
Proven ability to leadcomplex technical projects from requirements gathering through architecture,design, and delivery.
Strong business acumen withthe ability to align technical solutions with client challenges and objectives.
Bachelor/masterdegree in computer science, Information Technology, or a related field.
Professional certifications in AI Infrastructure, Containers and Kubernetes are highly desirable such as RHCSA, RHCE, CNCF certifications (CKA, CKAD, CKS), NVIDIA-Certified Associate - AI Infrastructure and Operations
Typically, 810 years of hands-on experience in architecting and implementing HPC, AI/ML, and container platform solutions within hybrid or private cloud environments, with a strong focus on scalability, performance, and enterprise integration.
Algoleap Technologies
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Practice Python coding challenges to boost your skills
Start Practicing Python Now30.0 - 35.0 Lacs P.A.
Thiruvananthapuram
15.0 - 20.0 Lacs P.A.
Thiruvananthapuram
15.0 - 20.0 Lacs P.A.
Bengaluru
11.0 - 15.0 Lacs P.A.
30.0 - 35.0 Lacs P.A.
Thiruvananthapuram
9.0 - 13.0 Lacs P.A.
Bengaluru
6.0 - 10.0 Lacs P.A.
Bengaluru
10.0 - 14.0 Lacs P.A.
Bengaluru
20.0 - 25.0 Lacs P.A.
11.0 - 16.0 Lacs P.A.