Posted:1 month ago|
Platform:
Work from Office
Full Time
The Role We are seeking a Site Reliability Engineer (SRE) to join our dynamic team responsible for the operational management of critical applications. This role involves leveraging tools like Dynatrace, Splunk for monitoring to ensure system reliability, performance, and scalability. The ideal candidate will have a strong background in SRE practices, automation, and a passion for improving system operations. Key Responsibilities Application Reliability: Ensure the reliability, availability, and performance of over 100 applications through proactive monitoring and incident management. Monitoring & Observability: Implement and maintain observability solutions using Dynatrace, creating dashboards and alerts to monitor system health and performance. IT Operations Management: Utilize ServiceNow ITOM for configuration management, incident response, and change management processes. Automation & Tooling: Develop automation scripts and tools to reduce manual tasks, improve deployment processes, and enhance system scalability. Incident Management: Lead the response to system incidents, perform root cause analysis, and implement preventive measures to avoid recurrence. Collaboration: Work closely with development, QA, and infrastructure teams to integrate reliability into the software development lifecycle. Capacity Planning: Analyze system performance data to forecast capacity needs and ensure systems can handle future growth. Key Requirements Experience: 5+ years in Site Reliability Engineering, DevOps, or related roles within large-scale enterprise environments. Technical Skills: Proficiency in monitoring tools like Dynatrace, ITOM platforms like ServiceNow, and scripting languages such as Python or Bash. Automation: Experience with infrastructure-as-code tools (e.g., Terraform, Ansible) and CI/CD pipelines. Operating Systems: Strong knowledge of Linux/Unix systems and networking fundamentals. Experience with Container Orchestration including Kubernetes and Docker Design and own Technical Solutions for broad or complex requirements with insightful and strategic approaches Prior experience deploying Cloud Services, Monitoring, Alerting, and Handling Escalations Experience supporting a High-Availability applications including SaaS environment. Charting new DevOps practices and a well-defined roadmap. Problem-Solving: Demonstrated ability to troubleshoot complex system issues and implement effective solutions. Communication: Excellent verbal and written communication skills, with the ability to collaborate across teams. Preferred Qualifications Certifications: Relevant certifications in SRE, DevOps, or cloud platforms (e.g., AWS, Azure, GCP). Cloud Experience: Familiarity with cloud-native architectures and services. Agile Methodologies: Experience working in Agile/Scrum environments. Why Join Us? Innovative Environment: Be part of a team that embraces innovation and continuous improvement. Career Growth: Opportunities for professional development and career advancement. Flexible Work: Hybrid work model supporting work-life balance. Impact: Play a crucial role in maintaining the reliability of services that impact millions of customers.
Talkmetakeme Software Solutions
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
My Connections Talkmetakeme Software Solutions
Trivandrum, Kerala, India
Salary: Not disclosed
Gurugram, Haryana, India
Salary: Not disclosed
Bengaluru
18.0 - 25.0 Lacs P.A.
Hyderabad, Telangana, India
Salary: Not disclosed
50.0 - 65.0 Lacs P.A.
5.84 - 8.915 Lacs P.A.
2.4 - 5.8 Lacs P.A.
Greater Chennai Area
Salary: Not disclosed
Hyderabad, India
30.0 - 45.0 Lacs P.A.
Hyderābād
6.5 - 10.0 Lacs P.A.