Site Reliability Engineer (SRE)

0 - 7 years

0 Lacs

Posted:5 days ago| Platform: Indeed logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Description

Platform & System Reliability (SRE)

  • Build and maintain highly available, scalable, and fault-tolerant systems in GCP and other cloud environments.
  • Design and implement automated solutions to eliminate toil and improve operational efficiency.
  • Develop, refine, and maintain monitoring, observability, and alerting systems across infrastructure and services.
  • Instrument platforms with OpenTelemetry for metrics, logs, and traces.
  • Own incident response processes, including on-call participation, root-cause analysis, and post-incident improvement actions.
  • Build and support CI/CD pipelines, GitOps workflows, and infrastructure-as-code deployments (e.g., Terraform).

Data Reliability Engineering (DRE)

  • Ensure reliability, accuracy, and availability of batch, streaming, and real-time data pipelines.
  • Instrument data flows with data observability patterns, including lineage (OpenLineage), freshness, completeness, and quality checks.
  • Monitor data systems end-to-end using automated alerting and anomaly detection.
  • Contribute to data SLOs, SLIs, and error budgets that measure reliability and drive continuous improvement.
  • Improve performance, scalability, and resilience across data storage systems (SQL,
  • NoSQL, lakehouse, analytics services).

Qualifications

  • 5–7 years in Site Reliability Engineering, Data Engineering, Platform Engineering, or similar roles.
  • Strong experience in GCP (preferred) plus exposure to OCI/Azure.
  • Proficiency in Python, Go, Bash, or similar languages for automation and tooling.
  • Hands-on experience with containerization, service mesh, and distributed systems design.
  • Expertise with observability platforms and telemetry standards (Prometheus, Grafana, Cloud Monitoring, OpenTelemetry).
  • Solid understanding of networking, Linux fundamentals, and scalable system design.
  • Familiarity with modern data platforms (BigQuery, Kafka, Spark, data lakes) and data reliability concepts.
  • Experience with IaC practices (Terraform, Ansible) and CI/CD systems.
  • Excellent communication skills for partnering with platform, data, and application teams.
  • Ability to work with team members and clients to assess needs, provide assistance, and resolve problems.
  • Strong problem-solving and analytical skills.
  • Desire to understand why things work the way they do.
  • Ability to present and explain technical concepts to business audiences.
  • All other duties as assigned.

This job posting will remain open a minimum of 72 hours and on an ongoing basis until filled.

Job Information Technology

Primary Location India-Karnataka-Bengaluru

Schedule: Full-time

Travel: No

Req ID: 254849

Job Hire Type Experienced Not Applicable #BMI N/A

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You