Posted:21 hours ago|
Platform:
Work from Office
Full Time
Position Summary The infrastructure specialist is responsible for the end-to-end setup, configuration, and management of server infrastructure in both on-premises and cloud environments. This role is critical for supporting AI lab as a service, ensuring scalable, secure, and high-performance server solutions. You will be responsible for deploying and maintaining server environments, application workloads including VMWare-based infrastructure, and ensuring full connectivity across AI workloads. This role will require expertise in server configuration, deployment, and maintenance, along with experience in managing both traditional and cloud-native infrastructure. You will work closely with technical teams, multiple stakeholders, application teams, ecosystems, site & platform reliability engineering teams and global IT teams etc to deliver cutting-edge server infrastructure for high-performance computing and AI-driven solutions. Key Responsibilities Technical Leadership Server Infrastructure Setup & Management: Lead the design, configuration, and management of both on-prem and cloud server infrastructures to support AI workloads and cloud-native applications. VMWare Expertise: Install, configure, and maintain VMWare-based infrastructure, including vSphere, vCenter, and ESXi hosts. This also includes hand-on experience on other OEMs like Nvidia and Intel etc. AI Lab Infrastructure: Deploy and manage the server-side infrastructure to enable AI lab environments, including setting up and high-performance computing servers. End-to-End Connectivity: Ensure seamless connectivity between on-prem and cloud environments, integrating storage, network, and compute resources effectively. Service Delivery & innovation Cloud and On-Prem Server Management: Maintain and optimize hybrid cloud environments, ensuring seamless integration between cloud platforms (AWS, Azure, GCP) and on-prem infrastructure. Server Configuration & Deployment: Oversee server configuration, deployment, and continuous optimization for performance, scalability, and reliability. Infrastructure Automation: Automate server provisioning, deployment, and configuration using infrastructure-as-code tools such as Ansible, Terraform, or similar. Monitoring & Performance Optimization: Implement monitoring and observability solutions to ensure optimal performance and early detection of infrastructure issues. Thought leadership and Client Engagements Client Interaction: Provide expert advice on server infrastructure design, setup, and optimization to clients in various sectors, with a focus on AI workloads and hybrid cloud environments. Workshops & Training: Lead client workshops to educate on best practices for server infrastructure management and optimization. Strategic Guidance: Help clients strategize server infrastructure needs, aligning solutions with business goals and future growth. Mandatory Skills & Experience Hands-on Expertise in Server Infrastructure: Strong experience with server setup, configuration, and management in both on-premises and cloud environments. Expertise in managing VMWare infrastructure (vSphere, vCenter, ESXi) along with other OEMs for example Nvidia, Intel etc. Experience in handling high-performance computing (HPC) servers, particularly in AI labs and GPU-based environments. Cloud Integration and Hybrid Environments: Solid understanding of cloud server management across AWS, Azure, and GCP. Experience in hybrid cloud server setups, managing workloads across both on-prem and cloud environments. Server Automation & Configuration Management: Expertise in using automation tools such as Ansible, Terraform, and scripting (e.g., Python, Shell). Proficiency in server provisioning, configuration, and deployment using infrastructure-as-code practices. Infrastructure Monitoring & Management: Strong understanding of infrastructure monitoring tools (e.g., Prometheus, Grafana, Nagios). Ability to manage server performance and optimize resource utilization across cloud and on-prem environments. Security & Compliance: Experience in configuring and maintaining secure server environments, including access control, encryption, and backup strategies. Familiarity with regulatory compliance standards for infrastructure management (e.g., GDPR, HIPAA, etc.). Certifications (Added Advantage): VMware Certified Professional (VCP) or higher. AWS Certified Solutions Architect or Azure Certified Solutions Architect. Certified Kubernetes Administrator (CKA). Certified in Infrastructure Automation tools (e.g., Ansible, Terraform). Desired Skills & Experience Advanced Server & Virtualization Skills: Expertise in server virtualization technologies, especially VMware, Hyper-V, and cloud-based solutions. Experience in deploying and managing high-availability (HA) server environments, ensuring minimal downtime and optimal reliability. AI Infrastructure Management: Experience in managing AI lab environments, including setting up GPU-based servers, clusters, and AI workloads on both on-prem and cloud infrastructures. Knowledge of AI frameworks (TensorFlow, PyTorch) and their integration with server environments. Hybrid Cloud Expertise: Proficiency in managing cloud-native server infrastructure using cloud platforms such as AWS, Azure, or GCP. Familiarity with serverless technologies and container orchestration (e.g., Kubernetes, OpenShift, Docker). Collaboration & Client Engagement: Strong communication skills to lead technical discussions and workshops with clients. Experience in presenting complex server infrastructure designs and solutions to both technical and non-technical stakeholders. Soft Skills and Behavioural Competencies Communication & Collaboration : Exceptional communication skills, with the ability to engage with technical teams and clients effectively and lead cross-functional projects. Problem-Solving : Ability to troubleshoot complex infrastructure issues across cloud and on-prem environments. Innovation & Learning : A keen interest in staying up to date with new server technologies, cloud advancements, and AI infrastructure. Client-Oriented : Proven ability to understand and meet client needs, offering tailored server solutions that align with business goals. Adaptability : Ability to work in a fast-paced, ever-evolving environment with a focus on continuous learning and process improvements.
HCLTech
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Mock Interview
4.0 - 8.0 Lacs P.A.
Greater Noida
4.395 - 5.4 Lacs P.A.
Hyderabad, Telangana, India
Salary: Not disclosed
Sadar, Uttar Pradesh, India
Salary: Not disclosed
Chennai, Tamil Nadu, India
Experience: Not specified
Salary: Not disclosed
Chennai
Experience: Not specified
5.09 - 6.1 Lacs P.A.
Bengaluru / Bangalore, Karnataka, India
Salary: Not disclosed
Greater Noida
15.0 - 25.0 Lacs P.A.
Hyderabad, Telangana, India
Salary: Not disclosed
Hyderabad, Telangana, India
Salary: Not disclosed