Site Reliability Engineer

5 - 10 years

15 - 30 Lacs

Posted:8 hours ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Description:

About the Role


We are looking for a highly skilled and experienced Senior Site Reliability Engineer (SRE) to join our team and play a key role in building and scaling the infrastructure of an advertising platform. The ideal candidate will have a strong background in system design, automation, CI/CD, monitoring, capacity planning, and cloud infrastructure (AWS) — with a passion for creating reliable, scalable, and highly available systems.

Requirements:

Required Skills & Qualifications

8+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering.
Strong programming and scripting skills in Python, Go, Bash (or similar), with a focus on automation and tooling.Expertise in CI/CD pipelines (Jenkins or similar) and infrastructure-as-code (Terraform, CloudFormation).Hands-on experience with AWS services (EC2, RDS, S3, VPC, IAM, CloudWatch, etc.) for infrastructure design and operations.Proficiency in Prometheus (or other monitoring/alerting systems) and incident management practices.Solid understanding of system design, distributed systems, and large-scale architecture.Strong background in capacity planning, performance tuning, and load testing.Excellent problem-solving, communication, and collaboration skills.

Job Responsibilities:

Key Responsibilities

System Design & Architecture
Design, build, and maintain scalable, resilient, and highly available infrastructure and services for our’s advertising platform.Collaborate with engineering teams to ensure new products and features are built with reliability, scalability, and performance in mind.Implement redundancy, failover strategies, and automated recovery mechanisms to minimize downtime and enhance service reliability.Leverage AWS services (e.g., EC2, RDS, S3, Lambda, VPC, IAM) to design and optimize infrastructure.Automation & ToolingDevelop automation frameworks and tools to improve CI/CD pipelines, infrastructure provisioning, and operational workflows.Leverage strong programming and scripting skills (Python, Go, Bash) to build scalable automation solutions, reducing manual intervention.Drive initiatives for end-to-end automation, optimizing efficiency and reducing human error.Monitoring & Incident ManagementImplement and maintain robust monitoring systems (e.g., Prometheus, Grafana) with real-time alerting on key system metrics (latency, availability, etc.).Lead incident response, troubleshooting, and root cause analysis, ensuring learnings are captured through post-mortem reviews.Collaborate with support and engineering teams to reduce MTTR (Mean Time to Recovery) and prevent recurring issues.Performance Optimization & Capacity PlanningAnalyze system performance and recommend improvements for latency, throughput, and cost optimization.Conduct capacity planning and load testing to ensure infrastructure can handle growth and peak traffic demands.Identify and eliminate bottlenecks to improve reliability and efficiency.Collaboration & Knowledge SharingWork closely with engineers, product managers, and stakeholders to align system reliability with business goals.Document best practices, system designs, and incident response procedures to improve team efficiency and knowledge sharing.Mentor and provide technical guidance to junior engineers, promoting a culture of continuous learning and improvement.

What We Offer:

Exciting Projects:

Collaborative Environment:

Work-Life Balance:

Professional Development:

Excellent Benefits:

Fun Perks:

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
Globallogic logo
Globallogic

Software Development

Santa Clara CA

RecommendedJobs for You

hyderabad, bengaluru, mumbai (all areas)