Posted:11 hours ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Role Overview

We are looking for an experienced Kubernetes with strong expertise in Kubernetes clusters, cloud-native technologies, storage integration, and performance optimisation. The ideal candidate should have hands-on experience in designing, deploying, and managing large-scale Kubernetes environments across on-prem and cloud platforms, along with troubleshooting complex containerised workloads.  


Key Responsibilities

Cluster Management & Deployment

  • Provision and manage Kubernetes clusters using kubeadm, RKE2, and Cluster API across cloud platforms (AWS, Azure, GCP, OpenStack). 
  • Deploy, scale, and upgrade applications using Kubernetes best practices (rolling updates, probes, HPA, VPA). 
  • Configure node scheduling strategies using taints, tolerations, and affinity rules. 

Application Deployment & Troubleshooting

  • Debug CrashLoopBackOff and pod failures using kubectl logs, events, and resource monitoring. 
  • Troubleshoot networking, persistent volumes, and service exposure issues (ClusterIP, NodePort, LoadBalancer, Ingress). 
  • Debug application routing using APISIX, NGINX ingress, and multi-path routing. 
  • Handle application scaling and high-traffic scenarios using autoscalers. 

Storage & Data Management

  • Integrate Ceph storage with Kubernetes via CSI drivers for block and filesystem provisioning. 
  • Troubleshoot PersistentVolume (PV) and PersistentVolumeClaim (PVC) issues. 

Observability & Performance

  • Deploy and configure monitoring solutions such as Prometheus and Metrics Server. 
  • Benchmark cluster and workload performance (CPU, memory, networking). 
  • Enable log collection and analysis for multi-container pods. 

Security & Networking

  • Manage authentication and RBAC policies within Kubernetes. 
  • Configure isolation for virtual Kubernetes clusters (vcluster). 
  • Handle registry authentication (AWS ECR, private registries) using image pull secrets. 

Specialized Workloads

  • Deploy and manage GPU workloads using NVIDIA GPU Operator. 
  • Enable GPU scheduling and resource allocation for AI/ML workloads. 

Operations & Maintenance

  • Troubleshoot faulty nodes (on-prem / cloud) including CPU, memory, disk, and kubelet health. 
  • Work on service routing, ingress configurations, and debugging cloud load balancer/firewall issues. 
  • Perform rolling upgrades and ensure zero-downtime deployments. 


Required Skills

  • Strong expertise in Kubernetes administration and cloud-native deployments. 
  • Hands-on experience with kubeadm, RKE2, Cluster API, and Terraform for cluster provisioning. 
  • Knowledge of storage integration with Ceph and CSI drivers. 
  • Experience with monitoring and observability tools (Prometheus, Grafana, Metrics Server). 
  • Strong debugging skills for pod crashes, networking issues, and persistent storage problems. 
  • Knowledge of NGINX ingress, APISIX, and traffic routing. 
  • Understanding of RBAC, security groups, and IAM policies in Kubernetes & cloud. 
  • Experience with GPU workloads in Kubernetes. 
  • Familiarity with CI/CD pipelines for Kubernetes deployments is a plus. 

 

Preferred Qualifications

  • 4+ years of hands-on experience in Kubernetes roles. 
  • Experience in both managed (EKS, AKS, GKE) and on-prem Kubernetes clusters. 
  • Strong scripting skills (Bash, Python, Go – preferred). 
  • Prior experience with infrastructure-as-code tools like Terraform, Helm, and Ansible. 
  • Exposure to multi-cluster and multi-tenant environments. 


Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
MulticoreWare Inc logo
MulticoreWare Inc

Software Development

San Jose CA

RecommendedJobs for You

ramapuram, chennai, tamil nadu

ramapuram, chennai, tamil nadu

ramapuram, chennai, tamil nadu