We are seeking a passionate and skilled Site Reliability Engineer (SRE) to join our team. In this role, you will ensure high availability, performance, and security of our systems while proactively identifying and resolving reliability issues. You will be responsible for monitoring, troubleshooting, automation, and building resilient infrastructure that supports millions of users globally.

Key Responsibilities

Monitor, troubleshoot, and resolve live-site issues to maintain uptime, performance, and security.
Define and manage
SLIs, SLOs, and error budgets
to ensure reliable user experiences.
Consolidate infrastructure monitoring and alerting into unified systems (e.g., Prometheus + Alertmanager) while enhancing alerts with contextual information (dashboards, runbooks, severity levels).
Continuously improve infrastructure by upgrading and patching OS, databases, networking, and related components.
Optimize on-call processes, lead incident response, root-cause analysis, and post-mortems.
Build self-healing systems, automate repetitive/manual tasks, and proactively identify opportunities to improve uptime.

What You Will Bring

Strong
SRE mindset
proactive in spotting problems, performance bottlenecks, and areas for improvement.
Hands-on expertise with
observability tools
and strong troubleshooting skills in distributed systems.
Ability to work in a fast-paced, results-driven environment that demands operational excellence.
Strong problem-solving skills with a track record of developing and implementing solutions.
Excellent organizational and multitasking skills to handle multiple complex priorities under tight deadlines.

Requirements

Bachelor’s degree in Computer Science, Engineering, or a related technical field.
2+ years of experience managing
distributed systems & web applications
with high uptime requirements (10M+ users preferred).
Proficiency in
Linux and LAMP stack
environments.
Experience with observability tools (e.g.,
Prometheus, Grafana, New Relic, CloudWatch, ELK, Zabbix
).
Experience with Infrastructure as Code (IaC) tools (e.g.,
Ansible, Terraform, Terragrunt
).
Strong ownership mindset, bias for action, and ability to deliver results end-to-end.
Excellent written and verbal communication skills.

Preferred Qualifications

Familiarity with
cloud computing
and the
AWS ecosystem
.
Programming experience to automate infrastructure tasks.
Flexibility to work during off-schedule hours (evenings/weekends) if required.

More Jobs at Chargepoint Technologies India

Site Reliability Engineer (SRE)

bengaluru

4.0 - 9.0 yrs

INR 20 - 27 Lacs

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

Chargepoint Technologies India

Login to

Please Verify Your Phone or Email

Confirm Action

Site Reliability Engineer (SRE)