Position Overview:
The Site Reliability Engineer (SRE) will play a key role in building and managing Kubernetes platforms that support cloud-native applications and services. This position focuses on architecting and managing reliable, scalable, and secure Kubernetes-based platforms on AWS, ensuring high availability and performance while optimizing costs and automation. The ideal candidate will have hands-on experience with AWS infrastructure, Kubernetes platform creation, Helm charts, Karpenter scaling, and Istio service mesh.
Key Responsibilities:
Kubernetes Platform Creation:
Design, implement, and maintain highly available, scalable, and fault-tolerant Kubernetes platforms. Ensure clusters are optimized for production workloads, providing high resilience and operational efficiency. AWS Infrastructure Management:
Build, manage, and optimize AWS cloud infrastructure, including EKS,ECS, S3, VPCs, RDS, IAM, and more. Implement best practices for cost management, scaling, and security within AWS. Helm Management:
Utilize Helm to automate and streamline the deployment of applications and services to Kubernetes clusters. Create, maintain, and manage Helm charts for production-ready deployments. Karpenter Implementation:
Implement and manage Karpenter to dynamically scale Kubernetes clusters in response to workload demands. Istio Service Mesh Management:
Configure and manage Istio to provide service-to-service communication, security, and observability within the Kubernetes clusters. Enable fine-grained traffic management, service discovery, and policy enforcement. Platform Automation & Scaling:
Automate the deployment, scaling, and management of infrastructure and applications. Work with CI/CD pipelines to ensure a seamless flow from development to production with minimal downtime. Incident Management & Troubleshooting:
Respond to incidents, troubleshoot, and resolve system issues related to performance, availability, and security in a timely and effective manner. Security & Compliance:
Design and implement secure cloud infrastructure with appropriate access controls, network security, and compliance frameworks. Documentation & Knowledge Sharing:
Create and maintain detailed documentation for Kubernetes platform setup, operational procedures, and best practices. Promote knowledge sharing across teams.
Required Qualifications:
- 3+ years of experience with Kubernetes/ K8s, Helm,Karpenter,Istio;
- 5+ years of Experience with
infrastructure-as-code
tools like Terraform
, Chef or Ansible
- 5+ years of Experience with
serverless computing
(AWS Lambda, API Gateway) and microservices architecture. - Experience with
multi-region
cloud environments. - Proven experience with
AWS
(EC2, RDS, S3, CloudFormation, IAM, etc.) and solid understanding of cloud-native architectures. - Strong expertise in
Kubernetes platform creation
, management, and optimization (e.g., setting up highly available clusters, networking, and storage). - Hands-on experience with
Helm
for Kubernetes application deployment and management. - Practical experience with
Karpenter
for dynamic scaling of Kubernetes clusters and optimizing resource usage. Expertise in managing and securing Istio
for service mesh, including traffic management, security, and observability features. - Proficiency in
CI/CD pipelines
and automation tools (e.g., Jenkins, GitLab, CircleCI, Terraform, Ansible, Spinnaker). Strong scripting and automation skills in Python
, Bash
, or Go
for infrastructure management and platform automation. - Experience with monitoring, logging, and alerting tools such as
Prometheus
, Grafana
, CloudWatch
, and ELK Stack
.
Preferred Qualifications:
- Understanding of
security best practices
for cloud platforms and Kubernetes (e.g., role-based access control (RBAC), encryption, and compliance frameworks). - Familiarity with
Docker
and containerization principles. Bachelors degree in Computer Science, Engineering, or related field
(or equivalent professional experience). Certifications (Preferred):
CKA (Certified Kubernetes Administrator), CKAD (Certified Kubernetes Application Developer), or AWS Certified DevOps Engineer are highly desirable.