Senior Site Reliability Developer

4 - 10 years

0 Lacs

Posted:14 hours ago| Platform: Foundit logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

OCI is Oracle's next-generation cloud platform, built for the most demanding enterprise workloads. We deliver high-performance computing, storage, networking, and platform services at global scale.

Senior Site Reliability Engineer (SRE)

This is a hands-on, high-impact role where you will be responsible for ensuring the reliability, scalability, and security of cloud-scale services that power AI workloads across Oracle Cloud.

Qualifications

  • 4-10 years of experience in site reliability, DevOps, or systems engineering.
  • Strong background in operating large-scale, distributed, and highly available systems.
  • Proficient with Linux, Python, and shell scripting.
  • Hands-on experience with Kubernetes (OKE, EKS, GKE, or similar) and Docker.
  • Experience with Infrastructure as Code (Terraform, Ansible, etc.) on a major cloud provider.
  • Knowledge of cloud networking, security, and routing (VPC, CIDR, security groups).
  • Familiarity with observability tools (Prometheus, Elasticsearch, Fluentd, Grafana).
  • Experience with CI/CD pipelines, git workflows, and agile development.
  • Understanding of disaster recovery, redundancy, and operational uptime planning.
  • Strong troubleshooting, problem-solving, and communication skills.
  • BS/MS in Computer Science or equivalent experience.

Desired Attributes

  • Resourceful and pragmatic in solving operational challenges.
  • Strong focus on automating repetitive tasks and reducing toil.
  • Committed to shared responsibility and improving the on-call experience.
  • Detail-oriented with strong critical-thinking skills.
  • Eager to learn and to mentor others in a collaborative environment.

  • Design, automate, and operate infrastructure resources in OCI (compute, storage, networking, load balancing).
  • Manage large-scale OKE clusters and containerized workloads.
  • Build automation for service provisioning, monitoring, and lifecycle management.
  • Develop dashboards, alerts, runbooks, and tooling to improve observability and reliability.
  • Troubleshoot and resolve complex production issues with a focus on resilience and uptime.
  • Contribute to service authentication, authorization, and security best practices.
  • Collaborate with software and ML engineers to deliver highly available AI infrastructure.
  • Participate in on-call rotations and improve incident response processes.

Career Level - IC3

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
Oracle logo
Oracle

Information Technology

Redwood City

RecommendedJobs for You