Site Reliability Engineer Lead

8 - 13 years

20.0 - 30.0 Lacs P.A.

Gurugram

Posted:1 month ago| Platform: Naukri logo

Apply Now

Skills Required

Linux AdministrationCi/CdSite Reliability EngineeringAWSReliability EngineeringShell ScriptingSreProgrammingApplication SupportIacDevopsJenkinsTerraformGCPMicrosoft AzureKubernetesPython

Work Mode

Remote

Job Type

Full Time

Job Description

Educational Background Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field. Advanced degree (Masters or equivalent) is often preferred for senior positions. Relevant certifications such as Linux Administration, Certified Kubernetes Administrator (CKA) Certifications in cloud platforms (AWS, Azure, Google Cloud) or DevOps methodologies (e.g., Certified DevOps Professional) Skills- 8+ years of experience in IT operations, service management, or infrastructure management, including roles such as Site Reliability Engineer, or DevOps lead Proven experience in managing high-availability systems and ensuring operational reliability Extensive experience in root cause analysis (RCA), incident management, and developing permanent solutions for recurring service disruptions. Hands-on experience with CI/CD pipelines, automation, system performance monitoring, and the implementation of infrastructure as code. Strong background in collaborating with cross-functional teams (development, operations, engineering, etc.) to improve operational processes and service delivery. Experience in managing deployments, risk assessments, and optimizing event and problem management processes. Familiarity with cloud technologies, containerization, and scalable architecture, including experience with zero-downtime deployment strategies. Responsibilities :- Problem Management Conduct thorough problem investigations and root cause analyses (RCA) to diagnose recurring incidents and service disruptions Coordinate with incident management teams,operations experts and collaborate with different Service Operations and Engineering teams to develop and implement permanent solutions. Monitor the effectiveness of problem resolution activities, provide regular reports on problem management activities, and ensure continuous improvement. Event Management Define and maintain an event catalog, specifying active events, thresholds, and relevant remediation, and optimize it for efficiency. Develop event response protocols, provide training to teams, and ensure quick and efficient handling of incidents. Collaborate with stakeholders to define events, ensure coverage across the Service Operations, and drive improvements based on post-event reviews and feedback. Deployment Management Own the quality of new release deployment for the Service Operations, ensuring a clear process and responsibilities are assigned for smooth implementation. Develop and maintain deployment schedules, conduct operational readiness assessments, and manage deployment risk assessments to ensure service stability. Oversee the execution of deployment plans, coordinate resources & process with delivery and lifecycle engineering, communicate with stakeholders, and continuously work with different stakeholders to improve deployment processes based on feedback. DevOps/NetOps Management Manage continuous integration and deployment (CI/CD) pipelines, ensuring smooth integration between development and operational teams. Automate operational processes, monitor system performance, and resolve issues related to automation scripts to increase efficiency. Implement and manage infrastructure as code, provide ongoing support for automation tools, and continuously improve DevOps practices.

IGT Solutions

Information Technology and Services

New Delhi

4000+ Employees

207 Jobs

    Key People

  • M. S. S. Gopala Krishnan

    CEO
  • Ramu Reddy

    Group Chief Financial Officer

RecommendedJobs for You