Job
Description
As a Senior Kubernetes Platform Engineer at our client corporation, your role involves designing and implementing the Zero-Touch Build, Upgrade, and Certification pipeline for the on-premises GPU cloud platform. Your primary focus will be on automating the Kubernetes layer and its dependencies using GitOps workflows to ensure a fully declarative, scalable, and reproducible infrastructure stack. - Architect and implement GitOps-driven Kubernetes cluster lifecycle automation using tools like kubeadm, ClusterAPI, Helm, and Argo CD. - Develop and manage declarative infrastructure components for GPU stack deployment, container runtime configuration, and networking layers. - Lead automation efforts for zero-touch upgrades and certification pipelines for Kubernetes clusters and associated workloads. - Maintain Git-backed sources of truth for all platform configurations and integrations. - Standardize deployment practices across multi-cluster GPU environments for scalability, repeatability, and compliance. - Drive observability, testing, and validation as part of the continuous delivery process. - Collaborate with infrastructure, security, and SRE teams to ensure seamless handoffs between lower layers and the Kubernetes platform. - Mentor junior engineers and contribute to the platform automation roadmap. In order to be successful in this role, you should meet the following qualifications: - 10+ years of hands-on experience in infrastructure engineering, with a strong focus on Kubernetes-based environments. - Proficiency in Kubernetes API, Helm templating, Argo CD GitOps integration, Go/Python scripting, and Containerd. - Deep knowledge and hands-on experience with Kubernetes cluster management, Argo CD, Helm, Containerd, NVIDIA GPU Operator, CNI plugin ecosystems, network policies, and GitOps workflows. - Experience in building Kubernetes clusters in on-prem environments and managing multi-cluster, GPU-accelerated workloads with high availability and security. - Strong scripting and automation skills in Bash, Python, or Go, familiarity with Linux internals, systemd, and OS-level tuning for container workloads. - Bonus: Experience with custom controllers, operators, Kubernetes API extensions, contributions to Kubernetes or CNCF projects, and exposure to service meshes, ingress controllers, or workload identity providers. As a Senior Kubernetes Platform Engineer at our client corporation, your role involves designing and implementing the Zero-Touch Build, Upgrade, and Certification pipeline for the on-premises GPU cloud platform. Your primary focus will be on automating the Kubernetes layer and its dependencies using GitOps workflows to ensure a fully declarative, scalable, and reproducible infrastructure stack. - Architect and implement GitOps-driven Kubernetes cluster lifecycle automation using tools like kubeadm, ClusterAPI, Helm, and Argo CD. - Develop and manage declarative infrastructure components for GPU stack deployment, container runtime configuration, and networking layers. - Lead automation efforts for zero-touch upgrades and certification pipelines for Kubernetes clusters and associated workloads. - Maintain Git-backed sources of truth for all platform configurations and integrations. - Standardize deployment practices across multi-cluster GPU environments for scalability, repeatability, and compliance. - Drive observability, testing, and validation as part of the continuous delivery process. - Collaborate with infrastructure, security, and SRE teams to ensure seamless handoffs between lower layers and the Kubernetes platform. - Mentor junior engineers and contribute to the platform automation roadmap. In order to be successful in this role, you should meet the following qualifications: - 10+ years of hands-on experience in infrastructure engineering, with a strong focus on Kubernetes-based environments. - Proficiency in Kubernetes API, Helm templating, Argo CD GitOps integration, Go/Python scripting, and Containerd. - Deep knowledge and hands-on experience with Kubernetes cluster management, Argo CD, Helm, Containerd, NVIDIA GPU Operator, CNI plugin ecosystems, network policies, and GitOps workflows. - Experience in building Kubernetes clusters in on-prem environments and managing multi-cluster, GPU-accelerated workloads with high availability and security. - Strong scripting and automation skills in Bash, Python, or Go, familiarity with Linux internals, systemd, and OS-level tuning for container workloads. - Bonus: Experience with custom controllers, operators, Kubernetes API extensions, contributions to Kubernetes or CNCF projects, and exposure to service meshes, ingress controllers, or workload identity providers.