Jobs

Interviews
Job Alerts
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Home
>
Jobs in Bengaluru
>
Entomo Gtdic
>
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Entomo Gtdic

4 - 6 years

10 - 13 Lacs

Bengaluru

Posted:5 months ago| Platform:

Apply

Skills Required

Performance Optimization Observation Monitoring Root Cause Analysis Cloudops AWS monitoring system performance

Work Mode

Hybrid

Job Type

Full Time

Job Description

entomo is an Equal Opportunity Employer. The company promotes and supports a diverse workforce at all levels across the Company. The Company ensures that its associates or potential hires, third-party support staff and suppliers are not discriminated against, directly or indirectly, as a result of their colour, creed, cast, race, nationality, ethnicity or national origin, marital status, pregnancy, age, disability, religion or similar philosophical belief, sexual orientation, gender or gender reassignment, etc.

Summary:

We are seeking a skilled Site Reliability Engineer (SRE) to join our team. In this role, you will be responsible for bridging the gap between development and operations by applying software engineering principles to infrastructure and operations tasks. Your primary focus will be

ensuring the reliability, availability, performance, and scalability of our production systems while minimizing manual operational work through automation and enhancing system resilience.

Position Overview:

The Site Reliability Engineer will work closely with development and operations teams to design, implement, and maintain highly reliable systems. You will be instrumental in establishing best practices for observability, incident response, and infrastructure management. Your expertise

will help reduce operational overhead, improve system performance, and ensure seamless deployments through CI/CD pipelines.

Qualifications

Required

Bachelor's degree in Computer Science, Engineering, or equivalent practical experience
3+ years of experience in SRE, DevOps or similar roles
Strong proficiency with Kubernetes (K8s) and Docker containerization
Experience with the ELK stack (Elasticsearch, Logstash, Kibana) for logging and monitoring
Good to have : Understanding of Java programming and troubleshooting Java applications
Working knowledge of SQL and MongoDB databases
Familiarity with Angular for frontend monitoring and diagnostic tooling
Strong understanding of system architecture, cloud infrastructure, and networking
Experience with Infrastructure as Code (IaC) tools (e.g., Terraform, Ansible)
Demonstrated experience with monitoring and observability platforms
Excellent problem-solving skills and ability to troubleshoot complex systems
Outstanding verbal and written communication skills

Preferred Skills:

Must Have : Experience with public AWS cloud platforms. Good to have knowledge and experience in Azure, GCP.
Knowledge of CI/CD tools (Jenkins, GitLab CI, GitHub Actions)
Familiarity with service mesh technologies (e.g., Istio)
Experience with scripting languages (Python, Bash)
Understanding of distributed systems and microservices architecture
Experience implementing SLOs, SLIs, and SLAs
Knowledge of security best practices
Certification in relevant technologies (CKA, AWS, etc.)

Roles and Responsibilities:

System

Design, implement, and maintain highly available and scalable infrastructure
Define and track Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets
Conduct capacity planning and performance optimization for critical systems
Implement strategies to improve system resilience and fault tolerance
Perform regular system health checks and proactive maintenance

Monitoring and Observability

Deploy and maintain comprehensive monitoring solutions using the ELK stack and other tools
Create and refine dashboards for system metrics, logs, and application performance
Set up effective alerting systems with appropriate thresholds to minimize alert fatigue
Implement distributed tracing to understand system behavior and identify bottlenecks
Ensure proper logging and telemetry across all services

Incident Management and Response

Lead incident response efforts, including troubleshooting, mitigation, and resolution
Conduct thorough post-incident reviews to identify root causes and preventive measures
Document incidents, resolutions, and knowledge for future reference
Develop and maintain runbooks for common operational procedures
Participate in on-call rotation to provide 24/7 coverage for critical systems

Automation and Toil Reduction

Identify and eliminate toil through systematic automation
Develop automated solutions for recurring operational tasks
Implement Infrastructure as Code (IaC) practices for consistent environment provisioning
Create self-service tools for developers to reduce operational dependencies
Automate testing and deployment processes for improved efficiency

CI/CD Pipeline Management

Design and maintain reliable CI/CD pipelines for continuous deployment
Implement automated testing within deployment workflows
Ensure smooth and reliable deployment processes with minimal disruption
Develop strategies for canary deployments and feature flagging
Create rollback mechanisms for quick recovery from failed deployments

Infrastructure Management

Manage Kubernetes clusters and containerized applications
Oversee configuration management and version control for infrastructure
Implement security best practices and compliance requirements
Optimize resource utilization and cost efficiency
Ensure proper backup and disaster recovery procedures

Collaboration and Knowledge Sharing

Work closely with development teams to improve application reliability
Provide guidance on architectural decisions from a reliability perspective
Conduct regular knowledge sharing sessions and documentation updates
Train team members on SRE practices and tools
Contribute to the development of SRE culture across the organization

Working Environment

Collaborative team environment focused on continuous improvement
Opportunity to work with cutting-edge technologies and solve complex problems
Balance of project work and operational responsibilities
Culture that values automation, innovation, and reliability
Emphasis on learning and professional development

Success Metrics

Improvement in system availability and reliability metrics
Reduction in mean time to detect (MTTD) and mean time to resolve (MTTR) incidents
Decreased frequency of production incidents and outages
Increased automation coverage and reduced manual operational work
Successful implementation of SLOs and monitoring systems
Positive feedback from development teams on collaboration and support

More Jobs at Entomo Gtdic

Qa Analyst

Bengaluru

3.0 - 5.0 yrs

INR 6 - 8 Lacs

Data Engineer

Bangalore Rural, Bengaluru

3.0 - 5.0 yrs

INR 3 - 8 Lacs

Product Intern

Bengaluru

Experience: Not specified

Salary: Not disclosed

Devops Lead

Bengaluru

8.0 - 12.0 yrs

INR 5 - 15 Lacs

Product Manager

Bengaluru

6.0 - 10.0 yrs

INR 9 - 17 Lacs

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

Entomo Gtdic

Before You Leave... Find Your Perfect Job!

Login to

Please Verify Your Phone or Email

Confirm Action

Contact Us

Senior Site Reliability Engineer

Experience & Salary

Skills Required

Work Mode

Job Type

Job Description

Summary:

Position Overview:

Required

Preferred Skills:

Roles and Responsibilities:

System

Monitoring and Observability

Incident Management and Response

Automation and Toil Reduction

CI/CD Pipeline Management

Infrastructure Management

Collaboration and Knowledge Sharing

Working Environment

Success Metrics

More Jobs at Entomo Gtdic