Site Reliability Engineer

6 - 9 years

15 - 30 Lacs

Posted:None| Platform: Naukri logo

Apply

Work Mode

Hybrid

Job Type

Full Time

Job Description

Technical Expertise And Experience

  • Deep understanding of SRE concepts, including SLIs, SLOs, SLAs, error budgets, and reliability engineering best practices.
  • Expertise in observability tools such as

    Prometheus, Thanos, and Grafana,

    CloudWatch is mandatory.

  • Strong hands-on experience with

    any of the monitoring tools

    with a proven ability to set up and manage monitoring and alerting systems.
  • Proficiency in cloud platforms (

    AWS

    is mandatory).
  • Strong scripting and automation skills, with proficiency in Python and Bash.
  • Hands-on experience with infrastructure operations and observability.
  • Extensive knowledge and hands-on experience across IT infrastructure, cloud platforms, and networking.
  • Significant experience with Kubernetes, including running, managing, and troubleshooting containerized workloads.
  • Experience working with version control systems like GitHub and implementing CI/CD pipelines is a plus.
  • Experience with infrastructure-as-code (IaC) tools like Terraform or ARM templates is a plus.

SRE Expertise

  • Ability to define and implement SRE best practices for data platforms, data driven applications, ensuring alignment with organizational goals.
  • Provide mentorship and guidance to teams in adopting SRE principles and improving operational excellence.
  • Collaborate with cross-functional teams to drive reliability, scalability, and performance across data engineering, data science, and platform engineering projects.
  • Monitor System performance and implement solutions to improve stability and efficiency.
  • Automate repetitive operational tasks using infrastructure-as-code and configuration management tools.
  • Create and maintain CI/CD pipelines for automated testing and deployment.
  • Participate in on-call rotations, respond to incidents and lead post- mortems to drive continuous improvement.

Soft Skills

  • Strong planning and organizational skills to manage individual and team responsibilities efficiently.
  • Excellent problem-solving and troubleshooting skills, with the ability to analyze complex issues and implement effective solutions.
  • Effective real-time communication, ensuring clear and concise updates for both technical and non-technical stakeholders.
  • Ability to work under pressure and manage incidents effectively, ensuring timely resolutions and minimal downtime.
  • Collaborative mindset with the ability to foster a culture of ownership, accountability, and continuous improvement.

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You

hyderabad, bengaluru, mumbai (all areas)