We are seeking a Site Reliability Engineer (SRE) to ensure our multi-cloud networking platform meets and exceeds the stringent reliability, performance, and availability targets our enterprise customers demand. This is not a traditional operations role you will apply a software engineering mindset to solve complex infrastructure challenges and automate solutions at scale. You will be the guardian of our production environment, responsible for the uptime of our services and the architect of the systems that allow us to scale with confidence. Your work is critical to building and maintaining the trust of our customers.

Responsibilities:

Define and Manage Reliability: Establish and own the Service Level Objectives (SLOs) and Service Level Indicators (SLIs) that define the reliability of our platform. Participate in a blameless post-incident analysis culture and an on-call rotation to manage and resolve production incidents.

Build and Own the Observability Stack: Design, implement, and manage our complete observability stack, leveraging tools like Prometheus for metrics, Grafana for visualization, Elasticsearch for logging, and Jaeger/OpenTelemetry for distributed tracing to provide end-to-end visibility into our distributed system.

Automate Everything: Write robust automation and tooling in Python or Go to eliminate manual operational tasks, from incident response to infrastructure provisioning.

Infrastructure as Code (IaC): Use Terraform and Ansible to manage our multi-cloud infrastructure as code, ensuring our environments are consistent, repeatable, and auditable.

Kubernetes and Cloud Operations: Manage, troubleshoot, and scale our Kubernetes clusters across our multi-cloud footprint (AWS, Azure, GCP). You will be the expert on running our application reliably in a containerized environment.

CI/CD and Release Engineering: Collaborate with development teams to enhance our CI/CD pipelines, ensuring that every release is safe, reliable, and can be deployed with high velocity.

Required Qualifications:

3-5+ years of experience in a Site Reliability Engineering (SRE), DevOps, or similar infrastructure-focused software engineering role.

Strong programming and automation skills in Python or Go.

Deep, hands-on expertise with a modern observability stack, including Prometheus, Grafana, and the ELK Stack (Elasticsearch, Logstash/Fluentd, Kibana).

Proven experience with Infrastructure as Code (Terraform) and configuration management (Ansible).

In-depth knowledge of running, managing, and troubleshooting applications on Kubernetes in a production, multi-cloud environment.

A rigorous, data-driven approach to reliability and a deep understanding of distributed systems, their failure modes, and how to make them resilient.

Preferred Qualifications:

Experience with distributed tracing using Jaeger or OpenTelemetry.

A strong understanding of cloud networking concepts (VPCs, subnets, routing, security groups).

Experience defining and tracking SLOs and error budgets.

Experience in a fast-paced startup environment.

Relevant certifications such as Certified Kubernetes Administrator (CKA) or cloud provider certifications (AWS, Azure, GCP).

More Jobs at Tata Communications Limited

Manager - Cybersecurity Operations

chennai, tamil nadu, india

Experience: Not specified

Salary: Not disclosed

Sr Technical Associate - Cloud & Security Customer Service Operations

pune, maharashtra, india

Experience: Not specified

Salary: Not disclosed

Lead - Captive Operations

hyderabad, telangana, india

5.0 - 7.0 yrs

Salary: Not disclosed

Assistant Manager - Cloud & Security Customer Service Operations

pune, maharashtra, india

35.0 - 37.0 yrs

Salary: Not disclosed

Sr Manager - Financial Reporting & Compliance

pune, maharashtra, india

7.0 - 9.0 yrs

Salary: Not disclosed

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

Tata Communications Limited

Login to

Please Verify Your Phone or Email

Confirm Action

Senior Manager - Staff Engineer