On-Prem Infrastructure Engineer / SRE

10 years

0 Lacs

Posted:1 day ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Location: Pan India

Experience:

5–10 Years

Role:

On-Prem Infrastructure Engineer / Site Reliability Engineer (SRE)

Job Summary

We are seeking a skilled On-Prem Infrastructure Engineer / SRE to manage and support NVIDIA’s on-prem engineering cloud infrastructure across multiple data centers. The ideal candidate will have strong experience in bare-metal infrastructure management, observability tools, automation, and production support. This role is critical in ensuring uptime, reliability, and operational excellence for engineering services.

Key Responsibilities

  • On-Prem Infrastructure Management
Manage and operate NVIDIA’s on-prem infrastructure across distributed data centers.
Maintain high availability, reliability, and readiness of on-prem engineering cloud environments.Perform lifecycle management of bare-metal servers and underlying hardware.
  • Service Level Management
Guard and maintain Service Level Agreements (SLAs) for mission-critical engineering services.
Implement and maintain monitoring, alerting, and incident response workflows.Drive root cause analysis (RCA), conduct post-mortems, and ensure corrective and preventive actions.
  • Observability & Monitoring
Deploy, configure, and manage observability tools such as

Prometheus, Grafana, ELK Stack

.Maintain KPI monitoring pipelines using

Jenkins, Python, and ELK

.Develop and enhance custom monitoring dashboards and business-specific alerting rules.
  • Automation & Optimization
Contribute to capacity planning, resource optimization, and performance tuning initiatives.Develop automation scripts/tools using

Python, Go, Bash

, or Jenkins pipelines.Improve operational efficiency through continuous automation.
  • Day-to-Day Operations & Support
Monitor system alerts, troubleshoot incidents, and resolve user-reported issues.Participate in

WAR rooms

during major or high-impact incidents.Ensure timely escalation and resolution of production issues.
  • Collaboration & Documentation
Create and maintain technical documentation for operational procedures, architectures, and troubleshooting steps.Work closely with engineering, DevOps, hardware, and data center teams to improve overall infrastructure reliability.

Required Skills & Experience

Strong hands-on experience in

bare-metal server management

using tools such as:

IPMI, Redfish, KVM

or similar technologies.

Experience With Automation And Scripting Using

Python, Go, Bash, Jenkins (CI/CD pipelines)

.

Practical Experience With Infrastructure Tools

Kubernetes, MySQL, Prometheus, Grafana, ELK (Elasticsearch, Logstash, Kibana)

.Solid understanding of system performance, capacity planning, and datacenter operations.Strong troubleshooting, incident-response, and operational debugging skills.Ability to work in fast-paced environments and handle production-critical scenarios.

Nice-to-Have Skills

Familiarity with

NVIDIA hardware

: GPUs, Tegra systems, DGX platforms, etc.Experience in large-scale distributed systems or high-performance computing environments.

Soft Skills

Strong communication and collaboration abilities.Analytical mindset with a focus on problem-solving.Ability to maintain composure under pressure in incident environments.Detail-oriented with strong documentation habits.ocumentation habits.

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You

pimpri chinchwad, maharashtra, india

pimpri chinchwad, maharashtra, india