Site Reliability Engineer II

NCR Atleos

2 - 5 years

9 - 13 Lacs

hyderabad

Posted:1 month ago| Platform:

Apply

Skills Required

kubernetes python performance tuning solarwinds sre monitoring tools scripting infrastructure operations system administration grafana gcp application support powershell splunk enterprise support bash prometheus communication skills itil

Work Mode

Work from Office

Job Type

Full Time

Job Description

Monitoring & Observability: Maintain and enhance monitoring systems using Prometheus, Grafana, Splunk, and SolarWinds. Ensure timely detection and resolution of issues through effective alerting and dashboards.
Application Support: Provide L1/L2 support for business-critical applications, including incident triage, health checks, deployment validation, and coordination with development and product teams.
Incident Management: Lead response for moderate to complex incidents, perform root cause analysis, and contribute to post-incident reviews and documentation.
Automation & Scripting: Develop and maintain automation scripts using Python, PowerShell, or Bash to streamline operational tasks and reduce manual effort.
Infrastructure Support: Monitor and support infrastructure health across on-prem and cloud platforms (GCP, Azure), including performance tuning and capacity planning.
Kubernetes Operations : Support containerized workloads and microservices running on Kubernetes clusters. Perform health checks, troubleshoot deployments, and optimize resource usage.
Process Adherence: Participate in ITIL-aligned processes for incident, change, and problem management. Ensure compliance with operational standards and audit requirements.
Knowledge Sharing: Document SOPs, recurring issues, and resolutions. Mentor junior engineers and contribute to team knowledge base.
Collaboration: Work closely with development, QA, and platform teams to support deployments, platform transitions, and reliability improvements.
Continuous Improvement: Proactively identify areas for improvement in system reliability, alerting, and operational workflows.
24/7 Support: Provide on-call support for critical issues.

Qualifications:

Bachelor s / masters degree in computer science, Engineering, or related field.
2-5 years of experience in SRE, infrastructure operations, system administration, or application support.
Proficiency in monitoring tools (Prometheus, Grafana, Splunk, SolarWinds).
Strong scripting skills (Python, Bash, PowerShell).
Experience with cloud platforms (GCP, Azure, AWS).
Hands-on experience with Kubernetes in production environments.
Solid understanding of ITIL practices and enterprise support workflows.
Hands-on experience with automation tools (ActiveBatch or similar).
Strong analytical, communication, and problem-solving skills.
Willingness to work in a 24 7 support environment and take ownership of reliability outcomes

More Jobs at NCR Atleos

SW Engineer III

Gurgaon, Haryana, India

5 - 8 yrs

Salary: Not disclosed

HR Specialist (I)

Mumbai Metropolitan Region

Experience: Not specified

Salary: Not disclosed

Audit - Fin Senior

Mumbai Metropolitan Region

Experience: Not specified

Salary: Not disclosed

Audit - Fin Manager (I)

Mumbai Metropolitan Region

3 - 5 yrs

Salary: Not disclosed

Global Account Management Analyst

Mumbai Metropolitan Region

Experience: Not specified

Salary: Not disclosed

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.