Jobs

Interviews
Job Alerts
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Jobs

Interviews

Home
>
Jobs in New Delhi
>
NVIDIA
>
Senior Site Reliability Engineer

Senior Site Reliability Engineer

NVIDIA

25 years

0 Lacs

New Delhi Delhi India

Posted:2 months ago| Platform:

Apply

Skills Required

reliability graphics gaming technology ai vision engineering development deployment software measurement automation efficiency service design triage monitoring latency metrics support virtualization kubernetes docker code terraform aws iam azure stack openstack openshift python diversity

Work Mode

On-site

Job Type

Full Time

Job Description

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world.NVIDIA is looking for a passionate member to join our DGX Cloud Engineering Team as a Sr. Site Reliability Engineer. In this role, you will play a significant part in helping to craft and guide the future of AI & GPUs in the Cloud. NVIDIA DGX Cloud is a cloud platform tailored for AI tasks, enabling organizations to transition AI projects from development to deployment in the age of intelligent AI. Are you passionate about cloud software development and strive for quality? Do you pride yourself in building cloud-scale software systems? If so, join our team at NVIDIA, where we are dedicated to delivering GPU-powered services around the world!

What You'll Be Doing

You will play a crucial role in ensuring the success of the Omniverse on DGX Cloud platform by helping to build our deployment infrastructure processes, creating world-class SRE measurement and creating automation tools to improve efficiency of operations, and maintaining a high standard of perfection in service operability and reliability.

Design, build, and implement scalable cloud-based systems for PaaS/IaaS.
Work closely with other teams on new products or features/improvements of existing products.
Develop, maintain and improve cloud deployment of our software.
Participate in the triage & resolution of complex infra-related issues
Collaborate with developers, QA and Product teams to establish, refine and streamline our software release process, software observability to ensure service operability, reliability, availability.
Maintain services once live by measuring and monitoring availability, latency, and overall system health using metrics, logs, and traces
Develop, maintain and improve automation tools that can help improve efficiency of SRE operations
Practice balanced incident response and blameless postmortems
Be part of an on-call rotation to support production systems

What We Need To See

BS or MS in Computer Science or equivalent program from an accredited University/College.
8+ years of hands-on software engineering or equivalent experience.
Demonstrate understanding of cloud design in the areas of virtualization and global infrastructure, distributed systems, and security.
Expertise in Kubernetes (K8s) & KubeVirt and building RESTful web services.
Understanding of building AI Agentic solutions preferably Nvidia open source AI solutions. Demonstrate working experiences in SRE principles like metrics emission for observability, monitoring, alerting using logs, traces and metrics
Hands on experience working with Docker, Containers and Infrastructure as a Code like terraform deployment CI/CD.
Exhibit knowledge in concepts of working with CSPs, for example: AWS (Fargate, EC2, IAM, ECR, EKS, Route53 etc...), Azure etc.

Ways To Stand Out From The Crowd

Expertise in technologies such as Stack-storm, OpenStack, Redhat OpenShift, AI DBs like Milvus.
A track record of solving complex problems with elegant solutions.
Prior experience with Go & Python, React.
Demonstrate delivery of complex projects in previous roles.
Showcase ability in developing Frontend application with concepts of SSA, RBAC

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, sex, gender, gender expression, sexual orientation, age, marital status, veteran status, or disability status. We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.
JR2000387

More Jobs at NVIDIA

Senior System Software Engineer – Simulation and Virtualization

Mumbai Metropolitan Region

5 - 5 yrs

Salary: Not disclosed

Senior System Software Engineer – Simulation and Virtualization

Gurugram, Haryana, India

5 - 5 yrs

Salary: Not disclosed

Senior System Software Engineer

Pune, Maharashtra, India

Experience: Not specified

Salary: Not disclosed

Senior Site Reliability Engineer

Pune, Maharashtra, India

Experience: Not specified

Salary: Not disclosed

Senior System Software Engineer, GPU Firmware

Pune, Maharashtra, India

Experience: Not specified

Salary: Not disclosed

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

NVIDIA

Login to

Please Verify Your Phone or Email

Confirm Action

Search

Profile

Upskill and Grow with AI

Senior Site Reliability Engineer