The Data Warehouse BizOps team is looking for a Site Reliability Engineer who can help us solve problems and enhance our capabilities by supporting applications, services, and platforms. This position is to support the Data Delivery and Analytics program to support the growing need of the product like MyMPA, QMR.
Tech Skills:
Unix, Shell Scripting, SQL, Python, Apache Nifi, Splunk, Dynatrace, Jenkins, GIT, XLR etc.
We regularly review our run state not only from an internal perspective, but also understanding and providing the feedback loop to our development partners on how we can improve the customer experience of our applications.
Responsibilities:
- Engage in and improve the whole lifecycle of servicesfrom inception and design, through deployment, operation and refinement.
- Analyse ITSM activities of the platform and provide feedback loop to development teams on operational gaps or resiliency concerns
- Support services before they go live through activities such as system design consulting, capacity planning and launch reviews.
- Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
- Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
- Support the application CI/CD pipeline for promoting software into higher environments through validation and operational gating, and lead Mastercard in DevOps automation and best practices.
- Practice sustainable incident response and blameless post-mortems.
- Take a holistic approach to problem solving, by connecting the dots during a production event thru the various technology stack that makes up the platform, to optimize mean time to recover
- Work with a global team spread across tech hubs in multiple geographies and time zones
- Share knowledge and mentor junior resources
Qualifications
- BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent practical experience.
- Experience with algorithms, data structures, scripting, pipeline management, and software design.
- Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
- Ability to help debug and optimize code and automate routine tasks.
- We support many different stakeholders. Experience in dealing with difficult situations and making decisions with a sense of urgency is needed.
- Experience in one or more of the following is preferred: C, C++, Java, Python, Go, Perl or Ruby.
- Interest in designing, analyzing and troubleshooting large-scale distributed systems.
- We need team members with an appetite for change and pushing the boundaries of what can be done with automation. Experience in working across development, operations, and product teams to prioritize needs and to build relationships is a must.
- Experience in industry standard CI/CD tools like Git/BitBucket, Jenkins, Maven, Artifactory, and Chef. Experience designing and implementing an effective and efficient CI/CD flow that gets code from dev to prod with high quality and minimal manual effort is desired.