Home
Jobs

Infrastructure & SRE Lead

0 years

0 Lacs

Posted:4 weeks ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Position Title: Infrastructure & SRE Lead Location: Bangalore (On-site) Role Overview We’re hiring an Infrastructure & AIOps Lead to champion the reliability, scalability, and cost-efficiency of our AWS platform, observability stack, and data warehouse. In this role, you’ll work hand-in-hand with backend, AI, and analytics teams (including mentoring our DevOps and Data-Ops engineers) to build AI-infused automation assistants, define and maintain runbooks, enforce SLOs, and continuously optimize both application infrastructure and our Redshift/Metabase data platform. You’ll leverage AI-assisted coding tools to accelerate routine ops workflows, own Terraform-driven deployments, and partner with stakeholders across product and engineering to keep our systems robust at scale. Key Responsibilities Cloud Infrastructure & Automation Take an automation-first approach to building an AI DevOps agent that accelerates MTTD and MTTR Design and maintain Terraform-based IaC for AWS resources (ECS, VPCs, RDS, SageMaker) and manage MongoDB Atlas clusters Optimize cost and performance through right-sizing, reserved instances, autoscaling, and continuous infrastructure reviews. On-call Reliability & Incident Management Serve as the primary PagerDuty escalation lead; refine alert rules and escalation policies Develop and maintain runbooks and playbooks for common incidents (database failovers, service crashes, latency spikes) Conduct post-mortems, track error budgets, and drive reliability improvements Monitoring & Observability Define SLIs/SLOs for critical services and build dashboards in NewRelic and Coralogix Instrument logging, tracing, and metrics pipelines; ensure high-fidelity alerts without noise CI/CD & Deployment Design, implement, and maintain GitHub Actions CI/CD pipelines that automate unit testing and enable continuous releases Collaborate on blue/green or canary release strategies to minimize user impact Data Platform & Analytics Support Oversee our data-ops function (Redshift data warehouse + Metabase) Ensure query performance, cost-optimization of the warehouse, and robust dashboard delivery for the analytics team Knowledge Sharing & Mentorship Mentor team members on best practices in reliability, observability, and automation Lead regular tech talks on infrastructure, security, and cost management Maintain and evolve our central runbook repository and documentation Must-Have Qualifications 5+ years of hands-on experience owning cloud infrastructure, preferably on AWS (ECS, RDS, S3, IAM, VPC) Proven track record in SRE or DevOps: on-call rotations, runbook development, incident response Strong IaC skills (Terraform, CloudFormation, or similar) and automation of CI/CD pipelines (GitHub Actions) Deep expertise in monitoring & observability (NewRelic, Coralogix) and alerting (PagerDuty) Solid understanding of container orchestration (ECS), networking, and security best practices Proficient programmer (Python or Go) capable of writing automation scripts and small tools Familiarity with AI-assisted coding workflows (e.g., GitHub Copilot, Cursor) and comfortable using AI to accelerate routine tasks Excellent communicator who thrives in a flat, high-ownership environment Nice-to-Have Experience building or integrating AI-powered automation assistants to streamline infra/data-ops workflows Hands-on practice with LLMs or AI frameworks for operational tooling (prompt engineering, embeddings, etc.) Prior involvement in ML/AI infrastructure (SageMaker, model-serving frameworks) Experience with large-scale database operations (Redshift, MongoDB Atlas) and caching (Redis) Familiarity with message queues and task runners (Celery, RabbitMQ, or similar) Prior involvement in ML/AI infrastructure (SageMaker, model-serving frameworks) Contributions to open-source DevOps/SRE tooling Show more Show less

Mock Interview

Practice Video Interview with JobPe AI

Start Reliability Interview Now

My Connections Healthify

Download Chrome Extension (See your connection in the Healthify )

chrome image
Download Now

RecommendedJobs for You

Bengaluru, Karnataka, India

Bengaluru, Karnataka, India