Site Reliability Engineer

5 - 10 years

6 - 16 Lacs

Posted:Just now| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Description

Job Title: Site Reliability Engineer

Objectives of this Role

  • Run the production environment by monitoring availability and taking a holistic view of system health.
  • Build software and systems to manage platform infrastructure and applications.
  • Improve reliability, performance, quality, and time-to-market of our suite of software solutions.
  • Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement.
  • Provide primary operational and performance engineering support for multiple large-scale distributed software applications.
  • Ensure systems are highly available, scalable, and meet performance SLAs.

Responsibilities

  • Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding.
  • Perform performance testing, analysis, and tuning across various layers (UI, API, DB, infrastructure).
  • Collaborate with development teams to profile applications and identify performance bottlenecks.
  • Create and execute performance test plans using tools like JMeter, Gatling, or Locust.
  • Partner with development teams to improve services through rigorous testing, profiling, and release procedures.
  • Participate in system design consulting, platform management, and capacity planning.
  • Establish baselines, SLAs, and SLOs for performance and reliability metrics.
  • Create sustainable systems and services through automation and uplifts.
  • Balance feature development speed and system reliability/performance with well-defined service-level objectives.

Required Skills and Qualifications

  • Bachelors degree (or equivalent) in Computer Science, Engineering, or a related field.
  • Hands-on programming skills in one or more structured/OOP languages such as Python, Java, C/C++, Ruby, or JavaScript.
  • Solid understanding of system internals (Linux, networking, threads, memory management).
  • Experience with distributed storage technologies such as NFS, HDFS, Ceph, Amazon S3, and resource management frameworks like Kubernetes, Apache Mesos, or Yarn.
  • Experience with performance testing tools like JMeter, Locust, Gatling, or similar.
  • Knowledge of APM and monitoring tools such as New Relic, AppDynamics, Grafana, or Prometheus.
  • Strong troubleshooting and root cause analysis skills across stack layers.
  • Proactive approach to identifying problems, performance bottlenecks, and areas for improvement.

Preferred Skills and Qualifications

  • Prior experience in performance engineering or site reliability engineering in a distributed environment.
  • Familiarity with CI/CD pipelines and infrastructure-as-code tools (e.g., Terraform, Ansible).
  • Experience in building observability into applications: logging, tracing, and metrics.
  • Coding experience beyond simple scripts, with an emphasis on automation and tool development.
  • Ability to clearly communicate technical findings and improvement plans across teams.

Regards
Arunadevi 7904878135 arunadevi.k@idexcel.com

Mock Interview

Practice Video Interview with JobPe AI

Start Java Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Java Skills

Practice Java coding challenges to boost your skills

Start Practicing Java Now
Idexcel logo
Idexcel

Information Technology and Services

Ashburn

RecommendedJobs for You

itanagar, arunachal pradesh, india

hyderabad, chennai, bengaluru