Site Reliability Engineer

5 - 10 years

5 - 9 Lacs

Posted:1 day ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

  • Design and implement scalable, resilient systems using modern technologies, such as serverless architecture, event stream processing, and Kubernetes
  • Assist in modernizing legacy systems by containerizing applications and services, helping to migrate them into scalable, cloud-native environments.
  • Develop and maintain tools for monitoring, alerting, and incident response to ensure system reliability and uptime
  • Collaborate with development teams to integrate feedback from observability tools, creating more robust and dependable solutions
  • Proactively identify and address potential reliability risks, ensuring systems are secure and performant
  • Help to enable industry best practices, including branch by abstraction, safe testing in production, and continuous delivery
  • Debug and diagnose complex issues across distributed systems, ensuring effective resolution and root cause analysis
  • Automate manual and repetitive tasks to improve operational efficiency and reduce potential for human error
  • Contribute to post-incident reviews and retrospectives, driving improvements in processes and systems
  • Champion a culture of reliability and continuous learning within the organization
Work Experience :
  • Minimum of 5 years of cumulative experience in Site Reliability Engineering, DevOps, Systems Engineering/Ops or Software Development roles
    • Cross-functional experience is a benefit
    • 1-2 years of experience with programming in one or more programming languages i.e. Bash Scripting
    • 1-2 years of experience working with systems administration (Ops/Infrastructure) or networking
  • Demonstrated experience in working with Linux systems and troubleshooting performance or application-related issues
  • Experience building and managing containers using tools such as Docker, Podman, and Kubernetes, with a focus on developing scalable, portable, and maintainable applications.
  • Hands-on experience in managing large-scale distributed systems
  • Strong collaboration and communication skills, including the ability to work effectively across time zones and teams
  • Proven ability to design and implement solutions from inception to production
  • Familiarity with cloud platforms such as Google Cloud Platform and Azure is highly desirable
Focuses :
  • Engineering robust, maintainable systems that prioritize reliability and scalability.
  • Taking accountability and ownership of systems and processes, with a strong focus on quality
  • Being proactive in identifying and resolving issues before they impact users
  • Continuously improving processes, tools, and technologies to enhance reliability
  • Experience owning issues and troubleshooting them without an overreliance on other

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You