Job
Description
About The Role
Key Responsibilities Design, build, and manage highly available Kubernetes clusters across hybrid environments (on-premises and cloud platforms such as AWS EKS, Azure AKS). Deploy and manage applications manually using tools such as kubectl and Helm, with growing integration of GitOps practices (e.g., ArgoCD). Implement and manage observability stacks using Prometheus , Grafana , Loki , and Mimir to monitor infrastructure, applications, and system performance. Define, monitor, and improve SLA/SLO/SLI metrics and alerting systems to ensure platform reliability. Automate provisioning and configuration of infrastructure using Terraform , Helm , and scripting languages (e.g., Bash, Python). Plan, implement, and test backup and disaster recovery (DR) strategies using tools like Velero , Commvault , etc. Manage Kubernetes-native networking , storage , and security configurations (Ceph, NFS, Ingress, PodSecurityPolicies, etc.). Configure and enforce Kubernetes security best practices using RBAC , OPA/Gatekeeper , NetworkPolicies , and secrets management tools. Integrate and operate Kubernetes ecosystem tools such as Karpenter , MicroK8s , Service Meshes , and kubectl plugins Conduct root cause analysis (RCA) and lead resolution efforts for incidents. Participate in the on-call rotation for platform availability and incident management. Maintain up-to-date documentation , architecture diagrams , runbooks , and SOPs Mentor engineers and advocate for Kubernetes, security, observability, and deployment best practices across teams. Continuously stay informed of industry trends in container orchestration , GitOps , security , and cloud-native tooling
Required Qualifications 6 to 13 years of IT/Infrastructure/DevOps experience, with 5+ years in Kubernetes operations in production environments. Strong hands-on experience in Kubernetes architecture , cluster operations, and manual application deployment practices. Intermediate-level experience in Kubernetes Security , including: Cluster hardening, secrets management Pod Security Standards (PSS), OPA/Gatekeeper Network policies, image scanning, and runtime protections Intermediate experience with ArgoCD for GitOps-style Kubernetes deployments. Solid proficiency in Linux system administration (Ubuntu, CentOS, RHEL) and troubleshooting. Hands-on experience with Kubernetes-native storage (e.g., Ceph, NFS) and persistent volume provisioning. Strong familiarity with observability tools : Grafana , Prometheus , Loki , Mimir , etc. Proficiency in Infrastructure as Code using Terraform , Helm , and scripting. Experience with Velero , Commvault , or similar for backup and DR. Experience operating and optimizing cloud-native Kubernetes platforms like EKS , AKS Exposure to tools like Karpenter , MicroK8s , Service Mesh , and Ingress Controllers Familiarity with AI/ML workloads running on Kubernetes is a plus. Excellent collaboration, communication, documentation, and incident resolution skills.
Preferred Qualifications Kubernetes certifications: CKA , CKAD , or CKS Strong understanding of container security, networking, and distributed system architecture. Experience using Portainer for container and Kubernetes management. Advanced knowledge of Grafana and other enterprise-grade observability tools. Experience managing large-scale Kubernetes clusters (200+ nodes) is highly preferred. Prior experience supporting production-grade, high-availability platforms and environments.
Why Join Us? Help shape and operate mission-critical, modern Kubernetes infrastructure Be part of a team focused on platform reliability, observability, and secure operations Contribute to and influence the evolution of deployment and automation practices (GitOps, IaC). Access cutting-edge tools , industry best practices, and continuous learning.