Site Reliability Engineering Lead, SAP Global Systems

8 - 14 years

25 - 30 Lacs

Posted:5 hours ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Are you seeking an environment where you can drive innovationDoes the prospect of working with top engineering talent get you charged upApple is a place where extraordinary people gather to do their best work. Together we create products and experiences people once couldn t have imagined and now can t imagine living without. Apple s IS&T manages key business and technical infrastructure at Apple -- how online orders are placed, the customer experience with technology in our retail stores, how much network capacity we need around the world and much more.The SAP Global Systems team within IS&T runs the Operations and Financial transactional platform that powers all of Apple functions like Sales, Manufacturing, Distribution and Financials. Think platform-as-product! Our team delivers great developer experiences to our Program, Project and Development teams through curated set of tools, capabilities and processes offered through our Internal Developer Platform. We automate infrastructure operations, support complex service abstractions, build flexible workflows and curate a frictionless ecosystem that enables end-to-end collaboration to help drive productivity and engineering velocity. This is a tremendous opportunity for someone who has the skill to own initiatives and a passion to work in a highly integrated global solution platform! Join us in crafting solutions that do not yet exist!
Description
As a member of the Cloud Platform Engineering Team, you would architect and advocate for SRE principles across our engineering teams. You would develop scalable systems, foster operational excellence, and mentor a team of SRE and DevOps engineers.RESPONSIBILITIES:- Build up, lead and improve existing processes to provide 24x7 operational response for applications in public cloud platforms.- Maintain services once they are live by setting up monitoring, alerting and measuring availability, latency, and overall system health.- Own and review work for accuracy, quality, application performance and completeness.- Review release readiness through activities such as system design consulting, reviewing all observability and monitoring, capacity planning, and launch reviews.- Understand processes to improve incident coordination among Apple teams.- Keep up to date with the newest technologies and tools and voice support for their value with the development teams.- Understanding of Core Principles of DevSecOps.- Partner with architects and engineers to design and implement automation, operations, and support solutions.- Strive for top quality results and continuously look for ways to improve and enhance platform reliability, performance, and security.- Partner Management- Proficient in designing and implementing end-to-end observability frameworks using tools such as Prometheus, Grafana, CloudWatch, ELK/EFK, and OpenTelemetry, ensuring service reliability through dashboard design, SLOs/SLIs, and alerting systems.
  • 8 - 14 years of experience with a track record of building and leading Cloud Native SRE and Operations for AWS or GCP Hyperscalers.
  • Solid experience supporting customer facing applications in an 24-7 uptime environment of distributed systems.
  • Bachelors degree or equivalent experience in Computer Science, Engineering or other relevant major.
  • Collaborate with security, development, and infrastructure teams to implement a Zero Trust Architecture, handle secrets securely, and establish secure CI/CD pipelines.
Preferred Qualifications
  • Expertise in SRE principles, production-scale system design, and DevOps practices.
  • Design / Architect the Solutions on Multi Cloud Environments / OnPrem systems.
  • Solid understanding of core cloud services such as IAM, EC2/GCE, RDS/CloudSQL, EKS/GKE, CloudWatch/Cloud Monitoring, S3/GCS etc
  • Understand complex landscape architectures. Have working knowledge of on-prem and cloud based hybrid architectures and infrastructure concepts of Regions, Availability Zones, VPCs/Subnets, Load balancers, API Gateways etc.
  • Good understanding of common authentication schemes, certificates, secrets and protocols.
  • Implement infrastructure-as-code practices applying tools such as Terraform, Helm, or Pulumi.
  • Scripting and/or coding skills needed for automation, triaging and troubleshooting . Experience on any of these scripting Python, Go, Java etc.
  • Experience with Planning and Designing the Disaster Recovery for BCP and Non BCP Applications.
  • Core Knowledge on the Standard processes of Security and Governance.
  • Expertise handling production incidents, with experience working towards resolution and collaborator communication during incidents.
  • Track record with improving service reliability and efficiency.
  • Ability to implement and coordinate telemetry using monitoring and observability tools
  • Adapt at prioritizing multiple issues in a high stress environment. Good experience in designing and improving response processes
  • Mentor and foster professional development of junior SREs, thereby contributing to operational excellence across diverse environments.
  • Automation focus for operational efficiency - designing and implementing automation processes for repeatable and consistent service deployment
  • A solid sense of ownership. critical thinking & interpersonal skills to work effectively across diverse & multi-functional teams.
  • Certifications like AWS Solutions Architect, AWS DevOps Professional, GCP Professional Architect is a plus.

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
Apple logo
Apple

Computers and Electronics Manufacturing

Cupertino California

RecommendedJobs for You

Kolkata, Mumbai, New Delhi, Hyderabad, Pune, Chennai, Bengaluru