Jobs

Interviews
Job Alerts
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Home
>
Jobs in hyderabad
>
Eli Lilly And Company
>
Principal Site Reliability Engineer

Principal Site Reliability Engineer

Eli Lilly And Company

8 - 13 years

20 - 25 Lacs

hyderabad

Posted:2 days ago| Platform:

Apply

Skills Required

unix automation networking linux devops healthcare open source distribution system monitoring python

Work Mode

Work from Office

Job Type

Full Time

Job Description

As a Lead SRE Engineer, you will drive reliability, scalability, and operational excellence across a portfolio of applications deployed on a modern internal platform. You will lead and mentor a team of SRE engineers, establish best practices, and collaborate closely with product and development teams to ensure robust, automated, and self-healing systems. Your leadership will be critical in shaping the SRE function and enabling the team to deliver high-impact solutions that support Lilly s mission.

What You ll Be Doing

L ead the SRE team responsible for the reliability and performance of applications deployed on a cloud-native internal platform.

Design, implement, and maintain automation frameworks , self-service tooling, and auto-healing systems to eliminate manual toil.

Build and enhance end-to-end observability , monitoring, logging, and alerting systems for proactive issue detection and resolution.

Ensure Uptime: Take ultimate ownership of our production environments stability. Lead end-to-end incident management, from escalation to Root Cause Analysis (RCA). Manage patching, upgrades, and disaster recovery processes.

Champion Infrastructure as Code ( IaC ) and CI/CD best practices to ensure consistent, repeatable, and secure deployments.

Collaborate with development and product teams to embed reliability and scalability into application design and architecture.

Continuously evaluate and introduce emerging tools and technologies to keep the SRE stack modern and efficient.

Mentor and guide SRE engineers , fostering a culture of ownership, innovation, and continuous improvement.

Implement AIOps frameworks to improve operational tasks and enhance system self-healing capabilities.

Participate in and optimise the on-call rotation , striving to minimise human intervention through automation.

Drive capacity planning, disaster recovery, and business continuity initiatives.

Support onboarding, documentation, and knowledge sharing for platform services and operational best practices.

How You Will Succeed

Demonstrate technical leadership and strategic thinking in SRE practices.

Proactively identify and resolve reliability risks and bottlenecks.

Foster strong cross-functional relationships with engineering, product, and operations teams.

Lead by example in incident management, troubleshooting, and performance optimisation.

Promote a culture of blameless postmortems and continuous learning.

Effectively communicate complex technical concepts to both technical and non-technical stakeholders.

What You Should Bring

Proven experience leading SRE or DevOps teams in a complex, cloud-native environment.

Deep expertise in at least one major cloud platform (AWS, Azure, or GCP).

Advanced knowledge of Linux/Unix systems, networking, and distributed systems.

Proficiency in programming/scripting (Python, Go, or similar).

Hands-on experience with containers and orchestration (Docker, Kubernetes at scale).

Strong background in CI/CD pipelines and Infrastructure as Code (Terraform, Ansible, Helm, etc.).

Expertise with observability platforms (Prometheus, Grafana, ELK, Datadog, Splunk).

Experience with SRE practices (SLIs, SLOs, error budgets, blameless postmortems).

Excellent problem-solving, debugging, and performance optimisation skills.

Experience with security engineering, IAM, secrets management, and vulnerability scanning is a plus.

Exposure to cloud cost optimisation strategies is desirable.

Experience mentoring and developing engineers.

Basic Qualifications and Experience Requirement

Bachelor s degree in Computer Science , Engineering, or related field.

8+ years of hands-on experience in SRE, DevOps, or related roles , with at least 2 years in a technical leadership capacity.

Demonstrated success in managing reliability for large-scale, distributed systems.

Relevant certifications (e.g., AWS Certified DevOps Engineer, CKA, etc.) are a plus.

Additional Skills/Preferences

Experience with AI/ML in operations (AIOps) for anomaly detection, predictive scaling, or automated incident triage.

Contribution to open-source projects or thought leadership in SRE/DevOps communities.

Knowledge of Agile principles and frameworks (e.g., Scrum, SAFe ), including related tools (such as Jira).

Excellent analytical, problem-solving, and investigative skills.

Strong communication and collaboration skills.

Additional Information

Availability to work flexible work hours is/may be required . This team will support continuous operations across two shifts and therefore, this role will require non-standard work hours, and some work on weekends and holidays . Appropriate adjustments in benefits will be provided for employees working non-standard hours where applicable

More Jobs at Eli Lilly And Company

Associate - Learning Content Administrator

Bengaluru

4 - 8 yrs

INR 5 - 9 Lacs

Medical Writer - Medical Affairs

Bengaluru

7 - 9 yrs

INR 11 - 12 Lacs

Manager SFO

Bengaluru

7 - 9 yrs

INR 25 - 30 Lacs

Manager HSE

Gurugram

5 - 10 yrs

INR 11 - 16 Lacs

Senior Biostatistician

Bengaluru

3 - 8 yrs

INR 11 - 21 Lacs

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

Eli Lilly And Company

Pharmaceutical Manufacturing

Indianapolis Indiana

Login to

Please Verify Your Phone or Email

Confirm Action

Principal Site Reliability Engineer