About Client
Hiring for One of the Most Prestigious Multinational Corporations!
Job Title :
Qualification :
Relevant Experience :
Must Have Skills :
- 8+ years of overall experience in roles such as Site Reliability Engineering, DevOps, or Linux Systems Engineering.
- 5+ years of hands-on, intensive experience administering, automating, and troubleshooting Red Hat OpenShift (OCP 4.x preferred) in large-scale production environments.
- Proven experience in a senior or lead engineering role, demonstrating ownership of complex projects and mentorship of others.
Technical Skills
- Expert-Level OpenShift: Deep, authoritative knowledge of OCP installation (IPI/UPI), upgrades, cluster administration, node management, and disaster recovery.
- Kubernetes Mastery: A fundamental and deep understanding of Kubernetes architecture and components (etcd, kube-apiserver, scheduler, etc.) and Operators (OLM).
- Infrastructure as Code (IaC): Strong proficiency with Ansible and Terraform for automating infrastructure provisioning and configuration management.
- Programming/Scripting: Advanced scripting and software development skills in Python or Go, as well as Bash.
- Observability: Hands-on experience building and managing monitoring and logging solutions (e.g., Prometheus, Grafana, Thanos, Alertmanager, ELK Stack, Splunk, Fluentd/Vector/OTEL).
- CI/CD & GitOps: Expertise with CI/CD tooling (e.g., Tekton ,Jenkins, GitLab CI, ArgoCD, GitHub Actions).
- Core Infrastructure: Strong proficiency in Linux/RHEL administration, networking (SDN, OVS, routing, firewalls, load balancer), and storage (Ceph, NFS, block storage, Object).
Good to Have Skills :
- Analytical Mindset: Exceptional problem-solving skills with the ability to diagnose complex technical issues across multiple platform layers.
- Ownership and Accountability: A strong sense of ownership and the drive to see issues through to resolution.
- Communication: Excellent communication and interpersonal skills, capable of explaining complex topics to both technical and non-technical audiences.
- Composure: Ability to remain calm and effective under pressure during critical incidents.
On-Call
- Willingness to participate in a 24x7 on-call rotation to handle critical platform incidents.
Roles and Responsibilities :
- Define and Uphold Reliability Standards: Establish and manage Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets for the OpenShift platform and its core services.
- Automate Everything: Design, build, and maintain robust automation to handle the full lifecycle of OpenShift clusters, including provisioning, upgrades, patching, scaling, and disaster recovery.
- Reduce Toil: Proactively identify and eliminate manual, repetitive operational work by developing and maintaining automation scripts (Python, Go, Bash) and Ansible playbooks.
- Incident Response and Root Cause Analysis: Lead high-severity incident response and conduct deep, blameless post-mortems to identify and implement permanent solutions to prevent recurrence.
- Proactive Health Management: Develop and implement automated health checks and self-healing capabilities to ensure cluster and application resilience.
- Subject Matter Expertise: Serve as the top-tier technical authority for OpenShift Container Platform architecture, networking (OVN-Kubernetes, SDN), load balancing, cross cluster management, storage (OpenShift Data Foundation/Ceph), and security.
- Observability: Architect and manage a comprehensive observability stack (e.g., Prometheus, Grafana, ELK/Fluentd) to provide deep insights into platform and application performance.
- CI/CD and GitOps: Engineer and optimize CI/CD pipelines for both platform components and tenant applications, championing GitOps principles for declarative configuration management.
- Capacity and Performance: Conduct advanced performance tuning, load testing, and capacity planning to ensure the platform can meet future demand.
Location :
CTC Range :
Notice period :
Shift Timing :
Mode of Interview :
Mode of Work :
Bhuvaneshwari S
Senior Specialist
Black and White outsourcing Pvt Ltd
Bangalore, Karnataka,INDIA.
bhuvaneshwari@blackwhite.in | www.blackwhite.in