We are seeking a Site Reliability Engineer to ensure the reliability, performance, and security of global login and authentication platforms.
This role is client-facing and offers the opportunity to apply automation, operational discipline, and deep technical skills to support mission-critical service .and operational discipline to support our applications.
Years of experience needed
Candidate experience 4+ Years
Technical Skills:
- Strong expertise in Authentication & Security protocols – OAuth 2.0, OIDC, Okta, Transmit, SAML, MFA, session/token flows.
- Hands-on experience with cloud-native and hybrid environments (GCP, PCF, AWS, Azure).
- Proficiency in supporting and debugging microservices deployed on cloud platforms, primarily involving Java / .NET Core and PostgreSQL, Mongo DB.
- Build and maintain monitors, dashboards, and synthetic transactions using Splunk, AppDynamics, Datadog, Grafana or equivalent.
- Design and implement Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Error Budgets to measure and maintain service health.
- Perform Linux system administration and develop automation scripts using Python, Go, Java, Shell, or similar languages.
- Provide support for cloud applications across GCP, Azure, AWS, or PCF environments.
- Manage production deployments and resolve deployment failures efficiently.
- Work with CI/CD tools such as Bitbucket Pipelines, Bamboo, Harness, or GitHub Actions to streamline development workflows.
- Apply strong knowledge of ITIL processes including Incident, Problem, Change, and Release Management.
- Use ITSM tools like JIRA, Remedy, ServiceNow, etc., for issue tracking and service management.
- Collaborate effectively with cross-functional teams and communicate clearly with stakeholders.
Process Skills:
- Experienced in SRE and Production Support for business-critical applications.
- Proactive monitoring, automation, and toil reduction mindset.
- Incident triage, resolution, and root cause analysis.
- Contribution to release readiness, production deployments, and service restoration.
107644-9
- Strong understanding of Agile and DevOps methodologies.
- Creation and maintenance of knowledge base articles, SOPs, and recovery guides.
Behavioral Skills:
- Effectively collaborates and communicates with stakeholders and ensures client satisfaction
- Resolve technical issues of incidents and explore workarounds for immediate resolution, identify permanent solutions.
- Participates as a team member and fosters teamwork by inter-group coordination within the modules of the project.
- Work under the direction of SRE Architect
- Develop model and set up proactive monitoring for transaction flows.
- Automate operations runbooks.
Certification:
- Relevant certifications in Cloud (GCP preferred), Splunk, ITSM, and Linux administration are a plus.
Responsibilities as Team Member/ Individual Contributor:
As a SAvE engineer individuals are expected to,
- Work in 24 * 7(Shifts) rotation model.
- Monitor the application availability.
- Act quickly on the application s and Batch Job failures.
- Effectively handle the Incident and Change management.
- Initiate and drive the Techline in case of outages/major incidents/Batch abends and ensure Service Restoration in the least time possible.
- Take care of the daily/weekly/monthly reporting.
- Own and deliver the user stories assigned as part of the sprint.
- The user stories range from application code Debugging, Issue analysis, Code fix, Knowledge base creation, documentation of SOP’s, Production Deployments, Pre & Post Patching/Maintenance activities, Service Requests.
- Build monitoring solutions using APM tools like Splunk, AppDynamics,1000Eyes, ITRS, AppMetrics, MoogSoft, Kafka etc.
- Automation of day-day operational tasks.
Responsibilities as Lead\SME:
As a Lead resource, you are expected to,
- Lead\Mentor the team members
- Handle customer communication, critical issues, Techlines and reporting aspects
- Drive process initiatives and service improvements, automations, etc.
- Ensure the deliverables are performed without any SLA miss
- Contribute to Mphasis level initiatives.
- Work towards customer roadmap, process improvements, automation.
- Immediate joiners only
- Ready to work from office
- Ready to work in 24*7
- Living within the city premises where office cabs operate.
- Work from Mphasis Office - ODC
- Work Location – Hyderabad and Bangalore
- Rotational Shift Timings – (6:30 AM TO 3:30 PM // 2:00 PM TO 11:00 PM // 10:00 PM TO 6:00 AM)