About the Role Own the deployment, scaling and hardening of our Kubernetes-based infrastructure. Automate end-to-end provisioning, ensure security and high availability, and troubleshoot production incidents. Key Responsibilities Kubernetes: Deploy, manage & optimize clusters (on-prem, EKS/GKE/AKS) IaC & GitOps: Automate with Terraform, Helm charts & Argo CD (or similar) CI/CD: Build/maintain pipelines (Jenkins, GitHub Actions, etc.) Monitoring: Implement Prometheus, Grafana & ELK for metrics, logs & alerts Troubleshooting: Diagnose container networking, storage & performance issues Security: Enforce RBAC, network policies & image-scanning best practices DR & Optimization: Define backup/restore strategies and cost-control measures Collaboration: Partner with dev teams on containerization and CI/CD workflows Required Qualifications 3-5 yrs in infrastructure, SRE or DevOps roles Hands-on Kubernetes (cluster lifecycle, Helm, CRDs) Linux administration & Bash scripting; networking tools (ip, netstat, tcpdump) IaC with Terraform/Ansible; deep Docker knowledge Monitoring with Prometheus/Grafana & ELK Automation scripting in Bash, Python or Go; Git proficiency; production debugging Preferred Skills Managed K8s services (EKS/GKE/AKS) Advanced IaC/GitOps (Argo CD, Terraform, Helm) Service mesh (Istio, Linkerd) Container security (Trivy, Clair) Custom tooling via Bash/Python automation