The main responsibility of Stability & Resilience division is to support the IT strategy & Production and gathers activities contributing directly to the stability and integrity of the Production and to the Information Systems resilience.
Within the division, the domain Global Monitoring & Log Analytics oversees Global Production Observability Systems and provides platforms and services around Elasticsearch, Splunk and Dynatrace technologies.
This domain includes the following key services:
Global Monitoring, providing Dynatrace services
Splunk (decommissioning by and of the year)
Logs As a Service, providing log management platform as a service based on Elastic stack (Elasticsearch, Kibana, Fleet, Elastic Agent, Logstash, Ingest pipelines) and Kafka technology.
Elastic As a Service, providing
o Elasticsearch (+Kibana) dedicated specific clusters for some applications on its servers
o Elasticsearch dedicated standard clusters on dMZR (based on an IBM Cloud product)
CyberSOC central data platform (Databus based on Kafka+Logstash, and DAP based on Elasticsearch)
Leveraging BNP Paribas Paris teams expertise and ISPL IT skills, the goal is to enable applications flawless production by providing secure and stable environments and by ensuring that all actions on production environments are done in a controlled manner.
The Automation and System Engineer will be integrated closely in the STA04 domains SRE team members which are in charge of:
Monitor, supervise and provide support on servers (mostly OnPrem standalones, but also some IT Cloud or dMZR VMs)
Ensure day to day actions needed on servers, including obsolescence and security management
Make deployment and supervision of OS upgrades and patches
Make deployment of our software components (new environments, upgrades, )
Design and make evolutions on our Ansible playbooks
Be part of a follow the sun team to ensure all tasks can be done safely and as quick as possible
Responsibilities
Direct Responsibilities Make the housekeeping of our services (check that our platforms / services are fully and constantly up and running with decent performance for our customers).
Ensure patching and updates for systems, applications and images
Ensure certificates renewals
Contribute in automation (ansible) development for production & non production tasks
Manage incident on our stack agents
Management of production anomalies and non-recurrent operations
Realize applications and platforms deployment tasks
For a predefined applications scope take care of ITSM processes based on ITIL framework:
o Incidents
o Requests
o Changes
Ensure that SLA targets are met for above activities
Handover to Paris teams if knowledge and skills are not available in ISPL
General Responsibilities
Contribute to the knowledge transfer with Paris teams
Contribute to the definition of procedures and processes necessary for the team
Help build team spirit and integrate into BNP Paribas culture
Contribute to the regular activity reporting and KPI calculation
Contribute to continuous improvement actions
Work with cross-functional teams to ensure IT services align with business needs and service level agreements (SLAs).
Technical & Behavioral Competencies