Principal Site Reliability Developer

6 - 10 years

0 Lacs

Posted:2 days ago| Platform: Foundit logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems. Facilitate service capacity planning and demand forecasting, software performance analysis, and system tuning.

What You'll Do

  • Design and automate the lifecycle of resources in Oracle Cloud Infrastructure: compute, network, storage, load balancing, etc.
  • Build and maintain our automated service provisioning features.
  • Manage our cloud-scale service in production. Develop dashboards, alerts, runbooks, automation.
  • Contribute to service authentication, authorization, and other security features.
  • Work closely with other engineers across the Engineering team to deliver a state-of-the-art product

Qualifications

  • Experience building high-performance, resilient, scalable, and well-engineered systems
  • Experience building systems using Infrastructure as Code with at least one public cloud provider
  • Proficient in Linux, Python and bash
  • Working knowledge of at least one public cloud networking/security api (ex: AWS VPC, security groups).
  • Working knowledge of at least one scripting language (ex: python, ruby, javascript)
  • Past participation in an on-call rotation, experience improving on-call documentation and tools.
  • Experience with agile software development.
  • Good communication skills, with ability to clearly articulate engineering designs verbally and in writing.
  • Solid grasp of everyday git commands and workflows.
  • Experience with at least one family of monitoring/logging/observability tools (ex: elasticsearch, prometheus, fluentd)
  • Disaster recovery, redundancy, and operational uptime planning experience
  • Experience with CI/CD tools and DevOps processes, knowledgeable in using Docker and Kubernetes cluster
  • Familiarity with the OSI model, CIDR, and routing
  • Relevant experience of 6 to 10 years

Desired Skills

  • Resourcefulness in the face of unique constraints.
  • Always iterating on ways to be more productive and effective.
  • You capture and prioritize automation of toil tasks.
  • Willingness to bear your share of and actively improve a rotating on-call schedule.
  • General problem solving skills, critical thinking, and attention to detail.
  • Eagerness to learn and to teach.

Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for the design and delivery of the mission critical stack, with focus on security, resiliency, scale, and performance. Authority for end-to-end performance and operability. Partner with development teams in defining and implementing improvements in service architecture. Articulate technical characteristics of services and technology areas and guide Development Teams to engineer and add premier capabilities to the Oracle Cloud service portfolio. Understand and communicate the scale, capacity, security, performance attributes, and requirements of the service and technology stack. Demonstrate clear understanding of automation and orchestration principles. Act as ultimate escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs). Utilize a deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations. Understand and explain the affect of product architecture decisions on distributed systems. Professional curiosity and a desire to a develop deep understanding of services and technologies.

Career Level - IC4

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
Oracle logo
Oracle

Information Technology

Redwood City

RecommendedJobs for You