As an FM Specialist, you will ensure end-to-end fault, incident, and problem management with a focus on meeting SLA/OLA targets. Handle high-level escalations and resolve complex network issues through in-depth technical and procedural expertise.
HOW YOU WILL CONTRIBUTE AND WHAT YOU WILL LEARN
- Fault management/configuration of Red Hat OpenShift Platform and coordinate with Project teams for HLD/LLD/TOL/Design reviews.
 - Troubleshoot issues with worker nodes, maintenance, and scale-out tasks.
 - Performing Maintenance activities on the cluster line, performing patching, configuration changes, installing operators, etc.
 - Hand in Hand coordination with the project for network integration activities, and will handle the trouble ticket/CR within SLA, and drive for Automation of tasks.
 - Customer Issue handling and ensuring the end customer services are maintained.
 - MOP/WI preparation for the activities and new learnings, and coordinating with the care team for finding and analyzing the RCAs.
 - Prepare, implement, and verify the configuration and integration of a Node / System, and will track issues related to tools and timely escalation as per a pre-defined matrix.
 - Assist the application team in resolving issues with pods like crash loopback, image pull back, and other errors- Assist in implementing best practices to configure readiness and liveness probes.
 
Qualifications
You have:
- 4-5 years of experience in OpenStack Cloud with a degree equivalent to B.E./B.Tech.
 - Hands-on experience with OpenShift Container Platform, including installation, configuration, container platform, and administration.
 - Perform Daily health checks of the cluster and fix issues based on observations & implement and customize Grafana dashboards if required to effectively monitor the cluster.
 - Assist the application team in resolving issues with pods like crash loopback, image pull back, and other errors- Assist in implementing best practices to configure readiness and liveness probes.
 - Manage, modify SCC based on custom requirements, and assign to specific projects, and perform certificate management when the existing certificates expire and when needed.
 - Check cluster utilization and share reports with the business when needed. Forecast capacity growth requirements and handle capacity increase, and end-to-end architectural knowledge on ACM, ACS, and ODF.
 
It would be nice if you also had:
- Knowledge of RHOCP Architecture and experience with Microservices, Containers & Orchestration (Kubernetes, Docker).
 - Knowledge with ticketing tools like ITSM, Incident management problem management change management.