Site Reliability Engineer

2 years

0 Lacs

Posted:23 hours ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

The Site Reliability Team is responsible for monitoring all aspects of MakeMyTrip, including production servers and services. You will be acting as first line of defense against any kind of service unavailability or performance of our production services 24 x 7 x 365

You will be frequently interacting with various groups within the organization, like Engineering, Sales, and Product, and hence need developing a good all-around understanding of components, systems, and networks is a must.

We don't expect you to have all the required knowledge when you join us, as many of these skills can be picked up through experience in the job; however, those who want to gain new skills and grow must be prepared to spend time doing suitable research and learning. You must be an eager and quick learner with decent communication skills and must be able to use your initiative to tackle a broad range of problems.


Responsibilities:

  • Understand the application architectures and gain the domain knowledge of how request flows within the ecosystem.
  • Alerts configuration and Metric coverage of Business, Application, and system-level.
  • Keep false alerts in check by tuning thresholds and setting up dependencies amongst applications.
  • React to alerts by correlating them, do first-level debugging to identify the incident root cause area, then escalate problems to the appropriate team till resolution.
  • Actively participate in incident post-mortems and triage incidents within the team.
  • Troubleshoot application problems like unhealthy application containers, high load/CPU, Non200 response codes using logs analysis - Adhere to defined process and be ready for some adhoc and surprise incidents
  • Help your coworkers by creating documentation and detailed knowledge sharing for continuous improvement. - Communications skills and clarity in reporting and communication.


Requirements:

  • 2+ years of relevant experience in a 24x7 Linux production environment.
  • Experience in monitoring, troubleshooting application problems, and incident management.
  • Proficiency in Linux commands to helpslicec, and dice data, like grep, awk, top, scp, s must have.
  • Experience in an AWS-based Dockerized environment is a huge plus.
  • Knowledge of SQL queries like select, insert, where, group by, order by, basic join is required.
  • Hands-on experience in dbdebugginglike finding errors/exceptions in logs, taking heap/thread dumps, is a plus.
  • Bring ideas to improve the overall efficiency of the NOC team.

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
MakeMyTrip logo
MakeMyTrip

IT Services and IT Consulting

London

RecommendedJobs for You