Posted:13 hours ago|
Platform:
On-site
Full Time
Senior DevOps & Site Reliability Engineer (DevOps + SRE)
About the Role
We are seeking a highly experienced Senior DevOps & Site Reliability Engineer to support and scale our cloud-native, containerized IoT platform built on AWS. You will work closely with the Technical Manager to automate infrastructure, build CI/CD pipelines, manage large-scale deployments, and ensure the platform’s reliability, security, and performance.
This role requires deep hands-on expertise in AWS, Docker/Kubernetes, serverless workflows, infrastructure automation, scripting (Python), and IoT-scale distributed systems reliability.
Key Responsibilities
DevOps Responsibilities
· Design, implement, and maintain CI/CD pipelines using GitHub Actions, AWS CodePipeline, or GitLab CI.
· Develop and automate deployment workflows following DevOps strategy and best practices.
· Manage Docker containerization, including multi-stage builds, optimization, and image security.
· Orchestrate containers using Kubernetes (EKS) or AWS ECS (Fargate/EC2).
· Manage and optimize ECR for image storage and versioning.
· Implement Infrastructure-as-Code using AWS CDK, Terraform, or CloudFormation.
· Build automated workflows for backend, microservices, and IoT services deployment.
· Support serverless architectures using AWS Lambda, Step Functions, EventBridge, etc.
· Implement secure secrets management using AWS IAM, KMS, and Secrets Manager.
· Handle configuration, environment management, and zero-downtime deployment strategies.
Site Reliability Engineering (SRE) Responsibilities
· Build and maintain monitoring, logging, tracing pipelines using CloudWatch, Grafana, Prometheus, X-Ray, and OpenTelemetry.
· Define and implement SLIs, SLOs, error budgets, and reliability dashboards.
· Ensure high availability, resilience, and performance of all systems under production.
· Conduct incident management, root cause analysis, and post-incident reviews.
· Optimize cost, compute utilization, autoscaling policies, and failover strategies.
· Implement cloud reliability patterns—circuit breaker, retries, throttling, canary and blue-green deployments.
· Manage production readiness, release safety, and operational excellence.
Required Skills & Qualifications
· 7+ years of experience in DevOps, SRE, or Cloud Infrastructure roles.
· Deep hands-on experience with:
o Docker containerization & orchestration
o Kubernetes (EKS) and/or AWS ECS
o AWS ECR (image lifecycle management)
o AWS IoT Core, Lambda, API Gateway, VPC, S3, IAM, CloudWatch
· Strong scripting experience — Python expertise preferred (Bash is a plus).
· Proficiency with GitHub for code management, automation, and CI/CD workflows.
· Strong background in Infrastructure-as-Code: AWS CDK, Terraform, or CloudFormation.
· Experience with reliability engineering frameworks, large-scale distributed systems, and HA/DR design.
· Knowledge of serverless computing and event-driven architectures.
· Strong understanding of cloud security, identity management, and compliance.
Job Type: Full-time
Pay: ₹2,000,000.00 - ₹3,000,000.00 per year
Work Location: In person
TalentXplore
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Practice Python coding challenges to boost your skills
Start Practicing Python Now20.0 - 30.0 Lacs P.A.
20.0 - 30.0 Lacs P.A.