Senior Principal Site Reliability Engineer, Fusion SRE

6 - 11 years

8 - 13 Lacs

Posted:-1 days ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

As a Senior Principal Site Reliability Engineer, you will be a key member of a high-impact team focused on the availability, performance, and operational excellence of Fusion SRE Middleware. You will take ownership of production environments including systems and the Fusion Middleware stack and support mission-critical business operations for Cloud Fusion Applications.

Your role will emphasize automation and optimization of operations across multiple production environments, recommending AI-driven solutions to enhance availability, performance, and supportability

. You will harness AI-based tools and predictive analytics to proactively identify issues, automate incident responses, and continuously improve system resilience. Additionally, you will provide escalation support for complex production problems, guide junior engineers, participate in major incident bridges, and help build and refine processes and procedures using AI-powered insights to drive smarter, data-driven decisions.
Our team is front-and-center in reducing event duration, leveraging operational experience, best practices, and tool development to automate incident management and drive continual improvement.

About the Role:

We seek a Senior Principal SRE to join our globally distributed team, responsible for detecting, triaging, and mitigating service-impacting events rapidly and effectively through automation and AI-powered insights. You will be part of a regional team, minimizing Fusion services downtime through exceptional incident management and system operations, with a strong emphasis on scalability, performance, security, and AI-driven optimization. In this dynamic role, you will gain deep insight into the inner workings of Oracle Cloud Fusion Apps, using AI tools to predict, identify, and address potential issues before they impact services. You ll influence cross-functional leaders and drive programs that boost service availability while leveraging AI to enhance real-time decision-making and improve operational efficiency.
If you re passionate about leveraging AI to break new ground as part of an agile team, we want to speak with you!

 
 

Key Responsibilities:

  • Automation:

    Develop and optimize operations through AI-powered automation. Apply machine learning and orchestration principles to every possible opportunity, reducing manual intervention and technical debt. Enhance operational outcomes with scalable, AI-driven automation solutions that anticipate issues and optimize system performance proactively.
  • Middleware Technology Expert:

    Lead L3 WebLogic Administration, managing server lifecycle, configuring and deploying applications, and monitoring server and application resources. Leverage AI-driven monitoring tools to proactively detect and resolve issues across application and infrastructure layers, ensuring efficient and automated troubleshooting.
  • Service Ownership:

    Act as a Service Owner for Fusion Apps customers, sharing full-stack ownership of critical services in partnership with Service Development and Operations. Utilize AI-based analytics to predict potential service disruptions and optimize service delivery to improve customer satisfaction and minimize downtime.
  • Technical Expertise:

    Provide deep technical guidance and serve as the ultimate escalation point for complex issues not documented in SOPs. Participate in major incident management as a subject matter expert, leveraging your understanding of service topologies, AI-driven insights, and dependencies to troubleshoot and resolve issues quickly and effectively.
  • Ownership Scope:

    Understand end-to-end configuration, dependencies, and behavioral characteristics of production services. Use AI-powered telemetry and monitoring systems to ensure mission-critical delivery with a focus on system health, security, resiliency, scale, and performance.
  • Service Requirements:

    Provide strategic direction and prioritization to Product Management and Service Development teams, guiding the addition of AI-enhanced capabilities to Oracle SaaS/ERP services. Act as an escalation point for undocumented or critical issues, leveraging AI tools to aid in faster resolution and proactive service improvements.

Professional Skills Requirements:

  • Excellent written and verbal communication, facilitation, and interpersonal skills.
  • Strong collaboration, customer service, empathy, flexibility, and conflict resolution abilities.
  • Ability to communicate clearly with technical and non-technical stakeholders.
  • Effective at working independently and managing multiple projects or responsibilities.
  • Highly motivated with the ability to thrive in fast-paced, team-oriented environments.
  • Strong analytical and problem-solving skills.
  • Adaptability to evolving priorities and deadlines.
  • Strong global teamwork skills.
  • Proven ability to handle multiple, competing priorities.

Required Qualifications:

  • Bachelor s degree in Computer Science or a related field, or equivalent experience.
  • 6+ years of experience in Site Reliability Engineering (SRE) or

    DevOps, or Systems Engineering.
  • 6+ years of hands-on automation experience using Python or Unix Shell Scripting.
  • Experience with AI-driven Monitoring and Predictive Analytics

    • Experience with AI/ML algorithms for performance optimization and incident prevention.
    • Hands-on experience integrating AI models and machine learning algorithms into automation workflows to enhance system resilience, reduce downtime, and improve operational efficiency.
    • Experience leveraging AI technologies, such as natural language processing (NLP) or machine learning, to accelerate incident detection, root cause analysis, and troubleshooting.
    • Familiarity with using AI-based orchestration tools for intelligent decision-making and automated incident response.
  • Proven expertise in designing and implementing solutions for telemetry, monitoring, scalability, performance, and reliability at both platform and application layers.
  • Demonstrated success working collaboratively with multiple teams to manage dependencies.
  • Production support experience in Middleware administration, with expertise in Oracle WebLogic Administration on Linux/Unix platforms.
  • Extensive experience in major incident management and change management processes.
  • Strong skills in:
    • Systems and network administration, application security, DevOps, or SRE
    • Web protocols, Linux/Unix tools and architecture (from kernel to shell, file systems, client-server protocols)
    • Analyzing and troubleshooting large-scale distributed services
    • Building automated tools (Python or Shell scripting)
    • WLST (WebLogic Scripting Tool) for monitoring and automation
    • Parallel job execution frameworks (such as Marionette Collective/MCollective or SALT, a plus)
    • Professional software engineering practices (Agile, coding standards, code reviews, source control, builds, testing, operations)
    • Infrastructure-as-a-Service, CI/CD systems, RESTful APIs, log analysis, and debugging tools
  • Proficient with WebLogic tools (Server Administrator Console, WLST, Ant tasks, SNMP Agents).
  • Administration experience with web servers such as OHS (Oracle HTTP Server) or Apache.
  • Experience with monitoring and alerting technologies (e.g., Prometheus, Sensu, Nagios, Kafka, Wavefront, BigPanda, DataDog, PagerDuty).
  • Basic Linux system administration, networking, storage, compute, and virtualization knowledge.
  • Self-motivated and resilient, able to move forward amid ambiguity.
  • Familiarity with Java heap heuristics and garbage collection policies.
  • Deep understanding of performance concepts (response time, throughput, resource utilization).
  • Experience with monitoring tools (Prometheus, Grafana) and alerting mechanisms.
  • Familiarity with ticket tracking systems, such as JIRA.
  • Ability and willingness to quickly learn new technical disciplines and train others.

Preferred Qualifications:

  • Experience with Fusion Apps functional flows.
  • Java programming experience and understanding structured SQL statements.
  • Knowledge of Oracle Business Intelligence Enterprise Edition (OBIEE) and Oracle Service-Oriented Architecture (SOA).

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
Oracle logo
Oracle

Information Technology

Redwood City

RecommendedJobs for You