The Site Reliability Engineer (SRE) will be a key contributor to the Product Reliability Engineering (PRE) Organization, supporting the Consumer Authentication space. As a part of the PRE team, you will be responsible for availability, performance, efficiency, change management, monitoring, emergency response, and capacity planning for various Production systems.
You will have the opportunity to participate in architectural reviews and identify design gaps, detect problems, perform root cause analysis, and contribute through innovative technical solutions that help to ensure system availability and team continuity.
You will leverage your creativity and critical thinking skills to enhance current workflows and processes through automation, toil reduction and integrated technical solutions. In this role you will be responsible for leading and supporting various cross-functional projects, solving complex technological problems, and maintaining the availability, resiliency and performance of high throughput, business critical Fintech API s.
Responsibilities:
-
Provide 24x7x365 application support across multiple systems and technologies, including rotational on-call support during non-business hours.
-
Apply configuration updates, operation break fixes, and other proactive maintenance activities to ensure the availability of production systems.
-
Design and develop tools for automation, toil reduction, and process improvement.
-
Understand application architecture and transaction flows to provide support to internal and external teams.
-
Remediate cybersecurity findings to ensure compliance with audit regulations.
-
Execute security patching following a follow-the-sun model.
-
Work closely with development and support teams for problem identification and resolution.
-
Provide verification for both Visa-related services, such as certificate installations and verifications, and Cardinal services, to ensure systems are correctly configured, compliant, and operational.
-
Manage the lifecycle of standard products owned and used by PRE.
-
Participate in functional and technical meetings throughout the development lifecycle.
-
Provide direct support during production deployments.
-
Manage and support incremental code pushes to production, ensuring that changes are deployed in manageable, low-risk increments to minimize disruption and facilitate rollbacks if necessary.
-
Act as a limited change approver, ensuring that changes to the production environment are thoroughly reviewed and approved.
-
Collaborate with DevSecOps teams to ensure new applications adhere to the security and high availability standards of Visa, while meeting operational hand-off requirements.
-
Maintain accountability to ensure proper controls are in place for monitoring and observability of systems with the goal of 99.99% uptime.
Basic Qualifications:
-Relevant work experience in IT Operations, Systems Engineering, SRE, or
similar discipline.
-Hands-on experience with Linux/Unix administration and basic network troubleshooting via CLI. -Experien