Home
Jobs

SRE II - Observability & Reliability

5 years

0 Lacs

Posted:15 hours ago| Platform: GlassDoor logo

Apply

Work Mode

On-site

Job Type

Part Time

Job Description

Job Summary


We are seeking a Senior Software Engineer to join our Site Reliability Engineering team, with a focus on Observability and Reliability. As a key member of our SRE team, you will play a critical role in ensuring the performance, stability, and availability of our applications and systems with a focused approach in Application Performance Management, Observability & Reliability of the platform.
The Senior Software Engineer will be responsible for the design, implementation, and maintenance of our observability and reliability infrastructure, with a primary focus on the ELK stack (Elasticsearch, Logstash, and Kibana). The role involves configuring, fine-tuning, and automating alerts, integrating Elastic solutions with other tools and applications, generating reports, and optimizing the observability and monitoring systems.

Key Duties & Responsibilities

1
Collaborate with cross-functional teams to define and implement observability and reliability standards and best practices.


2
Design, deploy, and maintain the ELK stack for log aggregation, monitoring, and analysis.


3
Develop and maintain alerts and monitoring systems, ensuring early detection of issues and rapid incident response.


4
Create, customize, and maintain dashboards in Kibana for different stakeholders.


5
Collaborate with software development teams to identify performance bottlenecks and recommend solutions.


6
Automate manual tasks and workflows to streamline observability and reliability processes.


7
Conduct regular system and application performance analysis and optimization, effective automation & tooling, capacity planning and optimization, security practices and compliance adherence, documentation and knowledge sharing, Disaster Recovery and backup.


8
Generate and deliver detailed reports on system performance and reliability metrics.


9
Stay up to date with industry trends and best practices in observability and reliability engineering.


Qualifications/Skills/Abilities
Minimum Requirements


Formal Education
Bachelor’s degree in computer science, Information Technology, or a related field (or equivalent experience).


Experience (type & duration)
5+ years of experience in Site Reliability Engineering, Obervability & reliability, DevOps


Skills
  • Proficiency in configuring and maintaining the ELK stack (Elasticsearch, Logstash, Kibana) is mandatory.
  • Strong scripting and automation skills, with expertise in Python, Bash, or similar languages.
  • Experience in Data structures using Elasticsearch Indices.
  • Experience in writing Data Ingestion Pipelines using Logstash.
  • Experience with infrastructure as code (IaC) and configuration management tools (e.g., Ansible, Terraform).
  • Handson and experience with cloud platforms ( AWS preferred) and containerization technologies (e.g., Docker, Kubernetes).
  • Good to have Telecom domain expertise but not mandatory
  • Strong problem-solving skills and the ability to troubleshoot complex issues in a production environment.
  • Excellent communication and collaboration skills.


Accreditation/certifications/licenses
Relevant certifications (e.g., Elastic Certified Engineer) are a plus.

Mock Interview

Practice Video Interview with JobPe AI

Start Reliability Interview Now
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You