Location: Whitefield, Bengaluru ( 5days WFO )
Experience: 6+ Years
Role Overview:
We are looking for a skilled
DevOps & Infrastructure Engineering Lead
to design, scale, and secure cloud-native infrastructure using AWS, Kubernetes (EKS), Terraform, and modern observability tools. The role demands deep hands-on experience and leadership in building resilient, secure, and automated systems that power high-scale production workloads.
Key Responsibilities:
Infrastructure & Platform Engineering
- Architect and manage highly available Kubernetes clusters using
AWS EKS
with optimized auto-scaling and secure networking. - Leverage
Docker
for containerization and implement secure CI/CD workflows using tools like Jenkins or Devtron. - Define and manage infrastructure using
Terraform
, maintaining reusable and version-controlled modules. - Build, monitor, and optimize core AWS services:
ALB/NLB, Security Groups, IAM (Roles & Policies), VPC, Route53, KMS, SSM, Patch Manager
, and more. - Ensure infrastructure is compliant with high availability, cost-efficiency, and security standards across environments.
Security, Governance & Compliance
- Enforce
CIS Benchmarking
standards for Kubernetes, EC2, IAM, and other AWS resources. - Implement
container image hardening
practices (e.g., scanning for CVEs, minimal base images, signed images, SBOM integration). - Configure cloud-native security controls (e.g., KMS, IAM boundaries, GuardDuty, SecurityHub) to enforce access control and encryption policies.
- Collaborate with InfoSec to manage vulnerability scans, incident playbooks, VAPT responses, and compliance posture reporting.
Observability & Monitoring
- Drive full-stack observability using
DataDog, CloudWatch, Grafana
, and OpenTelemetry (OTEL)
based open-source pipelines. - Build monitoring solutions to track infrastructure health, performance, latency, and alerts with actionable dashboards.
- Define and maintain SLIs, SLOs, error budgets, and automated alert escalation processes.
Data Platform Management
- Provision and manage highly available data stores:
MongoDB, MySQL/Aurora, Redis, DocumentDB, ClickHouse
. - Design and implement backup policies, patching processes, disaster recovery (RTO/RPO), and scaling strategies for all critical data systems.
Tooling & Automation
- Own end-to-end CI/CD workflows using Jenkins, Devtron, or similar platforms.
- Automate patching, configuration drift detection, resource tagging, and access lifecycle workflows.
- Create internal tooling and scripts to manage infrastructure, cost audits, access reviews, and deployment flows.
Team Leadership
- Lead and mentor a team of DevOps engineers, fostering ownership, growth, and collaboration.
- Drive technical architecture decisions, enforce best practices, and streamline operational playbooks.
- Conduct RCA for outages/incidents and lead cross-functional response improvements.
Required Skills:
- 6+ years of experience in DevOps, SRE, or Infrastructure Engineering roles, with 2+ years in a leadership or staff capacity.
- Strong hands-on expertise in
AWS
, Terraform
, Kubernetes (EKS)
, Docker
, and Linux internals. - Deep understanding of
cloud security
, IAM practices
, encryption
, CIS benchmarks
, and image hardening
strategies. - Experience with observability tools like
DataDog
, Grafana
, CloudWatch
, and OTEL
open-source alternatives. - Proven experience managing relational and NoSQL databases:
MongoDB, MySQL/Aurora, Redis, DocumentDB, ClickHouse
. - Experience implementing and managing infrastructure under compliance standards like SOC2, ISO27001, or similar.
Preferred Qualifications:
- Certifications:
AWS DevOps Professional
, AWS Solutions Architect
, HashiCorp Terraform Associate
. - Experience with automated compliance audits, security event correlation, and incident management tools.
- Familiarity with tools like
Cloudflare
, Devtron
, Prometheus
, and FluentBit
.
Why Join Us:
- Lead platform modernization and observability efforts for a rapidly scaling organization.
- Work in an engineering-first culture that values automation, security, and system design excellence.
If youre passionate about platform reliability, automation at scale, and cloud-native security, this role is for you.