Site Reliability Engineer

5 - 10 years

4 - 7 Lacs

Posted:6 hours ago| Platform: Foundit logo

Apply

Skills Required

scripting (bash/python) monitoring & observability (prometheus/grafana/dynatrace) virtualization technologies (kvm/vmware)

Work Mode

On-site

Job Type

Full Time

Job Description

We are seeking a highly skilled OpenShift Virtualization Engineer to join our dynamic cloud platform team. The ideal candidate will design, implement, administer, and manage our OpenShift Virtualization (KubeVirt) environment, ensuring stability, performance, and security of virtualized workloads. This role emphasizes day-2 operations, proactive monitoring, automation, and operational excellence in a global infrastructure setting.

Key Responsibilities

Observability, Monitoring, Logging & Troubleshooting

  • Implement and maintain end-to-end observability solutions, integrating monitoring, logging, and tracing tools such as Dynatrace and Prometheus/Grafana.
  • Explore and implement Event Driven Architecture (EDA) for enhanced real-time monitoring and automated response.
  • Perform deep-dive Root Cause Analysis (RCA) for global compute issues, identifying abnormalities and blind spots.
  • Monitor VM health, resource usage, and performance metrics proactively.
  • Detect unusual activity indicative of misconfigurations or security compromises.

Solution Design & Consulting

  • Provide technical consulting to application teams requiring OpenShift Virtualization (OSV) solutions.
  • Design, implement, and validate custom OSV clusters and VM solutions for critical applications with specialized requirements.

Capacity Management

  • Conduct capacity planning and forecasting for compute, memory, storage, and network resources.
  • Analyze resource utilization trends and provide recommendations for optimization or scaling.
  • Collaborate with application teams to project future demand and ensure scalability.
  • Develop and maintain capacity models and strategic reports.

OSV Automation & Efficiency

  • Develop automation solutions for repetitive OSV tasks (configuration, VM management, auditing, remediation).
  • Implement Site Reliability Engineering (SRE) practices to improve platform stability and efficiency.
  • Manage Role-Based Access Control (RBAC), namespaces, and resource quotas (CPU, disk, storage).

Knowledge Management

  • Maintain comprehensive internal documentation and customer-facing content to facilitate self-service.
  • Clearly articulate platform capabilities to stakeholders and end-users.

Required Skills & Experience

  • Bachelor's degree in Computer Science, IT, or related field (Master's preferred).
  • 5+ years in IT infrastructure, including 2+ years focused on Kubernetes/OpenShift.
  • Production experience with OpenShift Virtualization (KubeVirt).
  • Strong understanding of Kubernetes concepts: Pods, Deployments, Services, Storage Classes, Operators, Custom Resources.
  • Linux administration and networking fundamentals.
  • Scripting proficiency (Bash, Python) for automation tasks.
  • Experience with monitoring tools: Prometheus, Grafana, Dynatrace.
  • Solid understanding of virtualization technologies (KVM, VMware).
  • Excellent problem-solving skills and cross-functional collaboration abilities.

Preferred Skills & Experience

  • Red Hat Certified Specialist in OpenShift Virtualization or equivalent certification.
  • Experience with Infrastructure as Code (IaC) tools: Ansible, Terraform, OpenShift GitOps.
  • Familiarity with Software-Defined Networking (SDN) and Software-Defined Storage (SDS).
  • Public cloud experience (AWS, Azure, GCP) and hybrid cloud architecture.
  • Knowledge of CI/CD pipelines and DevOps methodologies.

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
Teamware Solutions logo
Teamware Solutions

IT Services and IT Consulting

Chennai Tamilnadu

RecommendedJobs for You

hyderabad, chennai, bengaluru

hyderabad, bangalore rural, bengaluru