Home
Jobs

Site Reliability Engineer

5 - 10 years

10 - 20 Lacs

Posted:1 month ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

The Role We are seeking a Site Reliability Engineer (SRE) to join our dynamic team responsible for the operational management of critical applications. This role involves leveraging tools like Dynatrace, Splunk for monitoring to ensure system reliability, performance, and scalability. The ideal candidate will have a strong background in SRE practices, automation, and a passion for improving system operations. Key Responsibilities Application Reliability: Ensure the reliability, availability, and performance of over 100 applications through proactive monitoring and incident management. Monitoring & Observability: Implement and maintain observability solutions using Dynatrace, creating dashboards and alerts to monitor system health and performance. IT Operations Management: Utilize ServiceNow ITOM for configuration management, incident response, and change management processes. Automation & Tooling: Develop automation scripts and tools to reduce manual tasks, improve deployment processes, and enhance system scalability. Incident Management: Lead the response to system incidents, perform root cause analysis, and implement preventive measures to avoid recurrence. Collaboration: Work closely with development, QA, and infrastructure teams to integrate reliability into the software development lifecycle. Capacity Planning: Analyze system performance data to forecast capacity needs and ensure systems can handle future growth. Key Requirements Experience: 5+ years in Site Reliability Engineering, DevOps, or related roles within large-scale enterprise environments. Technical Skills: Proficiency in monitoring tools like Dynatrace, ITOM platforms like ServiceNow, and scripting languages such as Python or Bash. Automation: Experience with infrastructure-as-code tools (e.g., Terraform, Ansible) and CI/CD pipelines. Operating Systems: Strong knowledge of Linux/Unix systems and networking fundamentals. Experience with Container Orchestration including Kubernetes and Docker Design and own Technical Solutions for broad or complex requirements with insightful and strategic approaches Prior experience deploying Cloud Services, Monitoring, Alerting, and Handling Escalations Experience supporting a High-Availability applications including SaaS environment. Charting new DevOps practices and a well-defined roadmap. Problem-Solving: Demonstrated ability to troubleshoot complex system issues and implement effective solutions. Communication: Excellent verbal and written communication skills, with the ability to collaborate across teams. Preferred Qualifications Certifications: Relevant certifications in SRE, DevOps, or cloud platforms (e.g., AWS, Azure, GCP). Cloud Experience: Familiarity with cloud-native architectures and services. Agile Methodologies: Experience working in Agile/Scrum environments. Why Join Us? Innovative Environment: Be part of a team that embraces innovation and continuous improvement. Career Growth: Opportunities for professional development and career advancement. Flexible Work: Hybrid work model supporting work-life balance. Impact: Play a crucial role in maintaining the reliability of services that impact millions of customers.

Mock Interview

Practice Video Interview with JobPe AI

Start Site Reliability Engineer Interview Now

My Connections Symphoni Hr

Download Chrome Extension (See your connection in the Symphoni Hr )

chrome image
Download Now
Symphoni Hr
Symphoni Hr

Human Resources Consulting

San Francisco

50-100 Employees

113 Jobs

    Key People

  • Alex Johnson

    CEO
  • Maria Garcia

    Chief Operating Officer

RecommendedJobs for You

Hyderabad, Chennai, Bengaluru

Bengaluru / Bangalore, Karnataka, India

Noida, Uttar Pradesh, India

Noida, Uttar Pradesh, India

Noida, Uttar Pradesh, India