Need a strong profile having good exp in stakeholder & SRE team management.
Experience working on Production engineering/ production support projects is a must which includes handling teams working in 24/7 model.
Good understanding of Incident, change, service req management is a daily routine so candidate should know how to manage the workload, rotate FTEs as and when required.
Management of Ad hoc activities such as Vulnerabilities fixes/ patching awareness is required.
Should be able to lead BAU governance activities Daily, Weekly & Monthly cadence with necessary reporting data.
Knowledge of applying SRE practices to daily operations is key.
Computer Science and/or Engineering degrees are preferred.
Having domain experience in Domestic Banking application areas (IMPS/ UPI) will be a great advantage.
Working Experience/ Awareness:
24x7 operations support model for mission critical applications and infrastructure using ServiceNow as the ITSM ticketing tool.
Hybrid and private-cloud operational support / administration activities such as provision, capacity management, reliability management, monitoring, restoration, etc.
Working knowledge on AppDynamics and Splunk for monitoring and setting up observability is key. CI/CD tool chains, setting up and running deployment pipelines and propagating changes on different environments. Maintaining middleware such as MQ as well as application servers (Tomcat).
Maintain Hazelcast Data storage platform clusters and Control M job schedulers.
Kubernetes cluster management, monitoring, and remediation. Knowledge of Docker is important.
Automating deployments and scripting self-healing workflows based on telemetry.
Work closely with the team to define SLIs and configure SLOs, respond to threshold alerts and optimize monitoring capability.
Work closely with the team to understand the code as well as configuration artifacts to debug and fix issues that may arise.
Must be inclined to work on proof of concepts solutions to optimize reliability such as those incorporating AI models for event correlation and assisted triaging.
Able to lead & drive SRE team to parallelly work on Service or Change Requests, Defect management board, backlog management in agile manner.
Good to have:
SRE Foundation certification by DevOps Institute or any other equivalent certification on SRE by a recognized body is mandatory.
ITIL/ ITSM certified
About Virtusa
Teamwork, quality of life, professional and personal development: values that Virtusa is proud to embody. When you join us, you join a team of 27,000 people globally that cares about your growth — one that seeks to provide you with exciting projects, opportunities and work with state of the art technologies throughout your career with us.
Great minds, great potential: it all comes together at Virtusa. We value collaboration and the team environment of our company, and seek to provide great minds with a dynamic place to nurture new ideas and foster excellence.
Virtusa was founded on principles of equal opportunity for all, and so does not discriminate on the basis of race, religion, color, sex, gender identity, sexual orientation, age, non-disqualifying physical or mental disability, national origin, veteran status or any other basis covered by appropriate law. All employment is decided on the basis of qualifications, merit, and business need.