Home
Jobs

Sr . Site Reliability Engineer

3 - 8 years

9 - 13 Lacs

Posted:10 hours ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

to join our team. In this role, you will be responsible for bridging the gap between development and operations by applying software engineering principles to infrastructure and operations tasks. Your primary focus will be ensuring the reliability, availability, performance, and scalability of our production systems while minimizing manual operational work through automation and enhancing system resilience.
Position Overview
The Site Reliability Engineer will work closely with development and operations teams to design, implement, and maintain highly reliable systems. You will be instrumental in establishing best practices for observability, incident response, and infrastructure management. Your expertise will help reduce operational overhead, improve system performance, and ensure seamless deployments through CI/CD pipelines.
Qualifications Required Skills and Experience
  • Bachelor s degree in Computer Science, Engineering, or equivalent practical experience
  • 3+ years of experience in SRE, DevOps, or similar roles
  • Strong proficiency with Kubernetes (K8s) and Docker containerization
  • Experience with the ELK stack (Elasticsearch, Logstash, Kibana) for logging and monitoring
  • Good to have: Understanding of Java programming and Java application troubleshooting
  • Working knowledge of SQL and MongoDB databases
  • Familiarity with Angular for frontend monitoring and diagnostic tooling
  • Strong understanding of system architecture, cloud infrastructure, and networking
  • Experience with Infrastructure as Code (IaC) tools (e.g., Terraform, Ansible)
  • Experience with monitoring and observability platforms
  • Excellent problem-solving skills and ability to troubleshoot complex systems
  • Strong verbal and written communication skills
Preferred Skills
  • Must have: Experience with AWS public cloud
  • Good to have: Knowledge of Azure and GCP
  • Familiarity with CI/CD tools (Jenkins, GitLab CI, GitHub Actions)
  • Understanding of service mesh technologies (e.g., Istio)
  • Experience with scripting languages (Python, Bash)
  • Understanding of distributed systems and microservices
  • Experience implementing SLOs, SLIs, and SLAs
  • Awareness of security best practices
  • Certifications in relevant technologies (e.g., CKA, AWS Certified)
Roles and Responsibilities System Reliability and Performance
  • Design, implement, and maintain highly available and scalable infrastructure
  • Define and track SLOs, SLIs, and error budgets
  • Conduct capacity planning and optimize performance
  • Improve system resilience and fault tolerance
  • Perform regular health checks and proactive maintenance
Monitoring and Observability
  • Deploy and maintain monitoring solutions (e.g., ELK stack)
  • Build dashboards for system metrics, logs, and app performance
  • Set up alerting systems to reduce alert fatigue
  • Implement distributed tracing and ensure service telemetry
  • Maintain comprehensive logging across systems
Incident Management and Response
  • Lead incident response, including mitigation and resolution
  • Conduct root cause analysis and post-incident reviews
  • Maintain incident runbooks and knowledge base
  • Participate in on-call rotation for critical systems
Automation and Toil Reduction
  • Identify and automate repetitive operational tasks
  • Implement Infrastructure as Code for consistent provisioning
  • Create self-service tools for developers
  • Automate testing and deployment processes
CI/CD Pipeline Management
  • Design and maintain reliable CI/CD pipelines
  • Implement automated testing within workflows
  • Support canary deployments, feature flagging, and rollback strategies
Infrastructure Management
  • Manage Kubernetes clusters and containerized applications
  • Oversee config management and version control
  • Implement infrastructure security and compliance
  • Optimize resources and ensure backup/disaster recovery
Collaboration and Knowledge Sharing
  • Partner with development teams to enhance reliability
  • Provide architectural guidance with an SRE lens
  • Conduct documentation and knowledge-sharing sessions
  • Promote SRE best practices across the organization
Working Environment
  • Collaborative, improvement-driven team culture
  • Exposure to cutting-edge technologies
  • Balance of project and operational responsibilities
  • Focus on automation, innovation, and resilience
  • Strong emphasis on learning and growth
Success Metrics
  • Improved system availability and reliability
  • Reduction in MTTD and MTTR
  • Fewer production incidents and outages
  • Increased automation and reduced manual effort
  • Successful SLO implementation and monitoring coverage
  • Positive feedback from dev teams on SRE support
LINKEDIN PROFILE
submit application
By clicking the submit application button, you consent to entomo processing your personal information for the purpose of assessing your candidacy for this position in accordance with entomo Job Applicant Privacy Policy.
transform people experience in your enterprise of tomorrow
No 60 Paya Lebar Road, #11-06 Paya Lebar Square, Singapore 409051
+65 3138 1767
2700 Post Oak Blvd, 21st Floor, Houston, TX 77056
+1 (800) 947 8211
The Onyx Tower 1 Office 910 P.O. Box 410870. Dubai, United Arab Emirates
+971 4399 52 53
Taubstummengasse 7 A-1040 Vienna
+ 43 1 78 66 318
Unit 27-13, Level 27, Q Sentral, Jalan Stesen Sentral 2, 50470, Kuala Lumpur, Malaysia
13th Cross, Sampige Road 4th Floor, #218 JP Royale, Malleshwaram, Bengaluru, Karnataka 560003

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You