Sr. DevOps Engineer & Site Reliability Engineer

7 years

20 - 30 Lacs

Posted:13 hours ago| Platform: GlassDoor logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Senior DevOps & Site Reliability Engineer (DevOps + SRE)

About the Role

We are seeking a highly experienced Senior DevOps & Site Reliability Engineer to support and scale our cloud-native, containerized IoT platform built on AWS. You will work closely with the Technical Manager to automate infrastructure, build CI/CD pipelines, manage large-scale deployments, and ensure the platform’s reliability, security, and performance.

This role requires deep hands-on expertise in AWS, Docker/Kubernetes, serverless workflows, infrastructure automation, scripting (Python), and IoT-scale distributed systems reliability.

Key Responsibilities

DevOps Responsibilities

· Design, implement, and maintain CI/CD pipelines using GitHub Actions, AWS CodePipeline, or GitLab CI.

· Develop and automate deployment workflows following DevOps strategy and best practices.

· Manage Docker containerization, including multi-stage builds, optimization, and image security.

· Orchestrate containers using Kubernetes (EKS) or AWS ECS (Fargate/EC2).

· Manage and optimize ECR for image storage and versioning.

· Implement Infrastructure-as-Code using AWS CDK, Terraform, or CloudFormation.

· Build automated workflows for backend, microservices, and IoT services deployment.

· Support serverless architectures using AWS Lambda, Step Functions, EventBridge, etc.

· Implement secure secrets management using AWS IAM, KMS, and Secrets Manager.

· Handle configuration, environment management, and zero-downtime deployment strategies.

Site Reliability Engineering (SRE) Responsibilities

· Build and maintain monitoring, logging, tracing pipelines using CloudWatch, Grafana, Prometheus, X-Ray, and OpenTelemetry.

· Define and implement SLIs, SLOs, error budgets, and reliability dashboards.

· Ensure high availability, resilience, and performance of all systems under production.

· Conduct incident management, root cause analysis, and post-incident reviews.

· Optimize cost, compute utilization, autoscaling policies, and failover strategies.

· Implement cloud reliability patterns—circuit breaker, retries, throttling, canary and blue-green deployments.

· Manage production readiness, release safety, and operational excellence.

Required Skills & Qualifications

· 7+ years of experience in DevOps, SRE, or Cloud Infrastructure roles.

· Deep hands-on experience with:

o Docker containerization & orchestration

o Kubernetes (EKS) and/or AWS ECS

o AWS ECR (image lifecycle management)

o AWS IoT Core, Lambda, API Gateway, VPC, S3, IAM, CloudWatch

· Strong scripting experience — Python expertise preferred (Bash is a plus).

· Proficiency with GitHub for code management, automation, and CI/CD workflows.

· Strong background in Infrastructure-as-Code: AWS CDK, Terraform, or CloudFormation.

· Experience with reliability engineering frameworks, large-scale distributed systems, and HA/DR design.

· Knowledge of serverless computing and event-driven architectures.

· Strong understanding of cloud security, identity management, and compliance.

Job Type: Full-time

Pay: ₹2,000,000.00 - ₹3,000,000.00 per year

Work Location: In person

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You