Home
Jobs

Site Reliability Engineer (SRE)

7 - 10 years

15 - 22 Lacs

Posted:1 month ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

The Role We are seeking a Site Reliability Engineer (SRE) to join our dynamic team responsible for the operational management of critical applications. This role involves leveraging tools like Dynatrace, Splunk for monitoring to ensure system reliability, performance, and scalability. The ideal candidate will have a strong background in SRE practices, automation, and a passion for improving system operations. Key Responsibilities Application Reliability: Ensure the reliability, availability, and performance of over 100 applications through proactive monitoring and incident management. Monitoring & Observability: Implement and maintain observability solutions using Dynatrace, creating dashboards and alerts to monitor system health and performance. IT Operations Management: Utilize ServiceNow ITOM for configuration management, incident response, and change management processes. Automation & Tooling: Develop automation scripts and tools to reduce manual tasks, improve deployment processes, and enhance system scalability. Incident Management: Lead the response to system incidents, perform root cause analysis, and implement preventive measures to avoid recurrence. Collaboration: Work closely with development, QA, and infrastructure teams to integrate reliability into the software development lifecycle. Capacity Planning: Analyze system performance data to forecast capacity needs and ensure systems can handle future growth. Key Requirements Experience: 5+ years in Site Reliability Engineering, DevOps, or related roles within large-scale enterprise environments. Technical Skills: Proficiency in monitoring tools like Dynatrace, ITOM platforms like ServiceNow, and scripting languages such as Python or Bash. Automation: Experience with infrastructure-as-code tools (e.g., Terraform, Ansible) and CI/CD pipelines. Operating Systems: Strong knowledge of Linux/Unix systems and networking fundamentals. Experience with Container Orchestration including Kubernetes and Docker Design and own Technical Solutions for broad or complex requirements with insightful and strategic approaches Prior experience deploying Cloud Services, Monitoring, Alerting, and Handling Escalations Experience supporting a High-Availability applications including SaaS environment. Charting new DevOps practices and a well-defined roadmap. Problem-Solving: Demonstrated ability to troubleshoot complex system issues and implement effective solutions. Communication: Excellent verbal and written communication skills, with the ability to collaborate across teams. Preferred Qualifications Certifications: Relevant certifications in SRE, DevOps, or cloud platforms (e.g., AWS, Azure, GCP). Cloud Experience: Familiarity with cloud-native architectures and services. Agile Methodologies: Experience working in Agile/Scrum environments. Why Join Us? Innovative Environment: Be part of a team that embraces innovation and continuous improvement. Career Growth: Opportunities for professional development and career advancement. Flexible Work: Hybrid work model supporting work-life balance. Impact: Play a crucial role in maintaining the reliability of services that impact millions of customers.

Mock Interview

Practice Video Interview with JobPe AI

Start Itom Interview Now

My Connections Talkmetakeme Software Solutions

Download Chrome Extension (See your connection in the Talkmetakeme Software Solutions )

chrome image
Download Now
Talkmetakeme Software Solutions
Talkmetakeme Software Solutions

Software Development

Tech City

50-100 Employees

11 Jobs

    Key People

  • Alice Johnson

    CEO
  • Bob Smith

    CTO

RecommendedJobs for You