Jobs

Interviews
Job Alerts
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Home
>
Jobs in hyderabad
>
Eli Lilly and Company
>
Principal Site Reliability Engineer

Principal Site Reliability Engineer

Eli Lilly and Company

8 years

0 Lacs

hyderabad telangana india

Posted:4 days ago| Platform:

Apply

Skills Required

reliability discovery healthcare management technology data software engineering drive scalability portfolio development leadership support design automation service tooling auto monitoring logging stability escalation analysis patching code architecture stack planning onboarding documentation resolve troubleshooting learning devops aws azure gcp linux unix networking programming scripting python orchestration docker kubernetes terraform ansible helm datadog splunk debugging security iam scanning strategies mentoring certifications ai ml scaling triage agile scrum jira communication collaboration

Work Mode

On-site

Job Type

Full Time

Job Description

At Lilly, we unite caring with discovery to make life better for people around the world. We are a global healthcare leader headquartered in Indianapolis, Indiana. Our employees around the world work to discover and bring life-changing medicines to those who need them, improve the understanding and management of disease, and give back to our communities through philanthropy and volunteerism. We give our best effort to our work, and we put people first. We’re looking for people who are determined to make life better for people around the world.

About The Technology Organization

Technology at Lilly builds and maintains capabilities using pioneering technologies like most prominent tech companies. What differentiates Technology at Lilly is that we create new possibilities through tech to advance our purpose – creating medicines that make life better for people around the world, like data driven drug discovery and connected clinical trials. We hire the best technology professionals from a variety of backgrounds, so they can bring an assortment of knowledge, skills, and diverse thinking to deliver solutions in every area of our business.

About The Business Function

The Software Product Engineering (SPE) team is a specialised engineering group that delivers strategic solutions and differentiated capabilities. We take a forward-thinking approach, focusing on an enterprise platform and product mindset, ensuring that the solutions we build can be leveraged across Technology teams for broader impact and efficiency.

Job Title:

Principal Site Reliability Engineer

Role Summary

As a Lead SRE Engineer, you will drive reliability, scalability, and operational excellence across a portfolio of applications deployed on a modern internal platform. You will lead and mentor a team of SRE engineers, establish best practices, and collaborate closely with product and development teams to ensure robust, automated, and self-healing systems. Your leadership will be critical in shaping the SRE function and enabling the team to deliver high-impact solutions that support Lilly’s mission.

What You’ll Be Doing

Lead the SRE team responsible for the reliability and performance of applications deployed on a cloud-native internal platform.
Design, implement, and maintain automation frameworks, self-service tooling, and auto-healing systems to eliminate manual toil.
Build and enhance end-to-end observability, monitoring, logging, and alerting systems for proactive issue detection and resolution.
Ensure Uptime: Take ultimate ownership of our production environment's stability. Lead end-to-end incident management, from escalation to Root Cause Analysis (RCA). Manage patching, upgrades, and disaster recovery processes.
Champion Infrastructure as Code (IaC) and CI/CD best practices to ensure consistent, repeatable, and secure deployments.
Collaborate with development and product teams to embed reliability and scalability into application design and architecture.
Continuously evaluate and introduce emerging tools and technologies to keep the SRE stack modern and efficient.
Mentor and guide SRE engineers, fostering a culture of ownership, innovation, and continuous improvement.
Implement AIOps frameworks to improve operational tasks and enhance system self-healing capabilities.
Participate in and optimise the on-call rotation, striving to minimise human intervention through automation.
Drive capacity planning, disaster recovery, and business continuity initiatives.
Support onboarding, documentation, and knowledge sharing for platform services and operational best practices.

How You Will Succeed

Demonstrate technical leadership and strategic thinking in SRE practices.
Proactively identify and resolve reliability risks and bottlenecks.
Foster strong cross-functional relationships with engineering, product, and operations teams.
Lead by example in incident management, troubleshooting, and performance optimisation.
Promote a culture of blameless postmortems and continuous learning.
Effectively communicate complex technical concepts to both technical and non-technical stakeholders.

What You Should Bring

Proven experience leading SRE or DevOps teams in a complex, cloud-native environment.
Deep expertise in at least one major cloud platform (AWS, Azure, or GCP).
Advanced knowledge of Linux/Unix systems, networking, and distributed systems.
Proficiency in programming/scripting (Python, Go, or similar).
Hands-on experience with containers and orchestration (Docker, Kubernetes at scale).
Strong background in CI/CD pipelines and Infrastructure as Code (Terraform, Ansible, Helm, etc.).
Expertise with observability platforms (Prometheus, Grafana, ELK, Datadog, Splunk).
Experience with SRE practices (SLIs, SLOs, error budgets, blameless postmortems).
Excellent problem-solving, debugging, and performance optimisation skills.
Experience with security engineering, IAM, secrets management, and vulnerability scanning is a plus.
Exposure to cloud cost optimisation strategies is desirable.
Experience mentoring and developing engineers.

Basic Qualifications And Experience Requirement

Bachelor’s degree in Computer Science, Engineering, or related field.
8+ years of hands-on experience in SRE, DevOps, or related roles, with at least 2 years in a technical leadership capacity.
Demonstrated success in managing reliability for large-scale, distributed systems.
Relevant certifications (e.g., AWS Certified DevOps Engineer, CKA, etc.) are a plus.

Additional Skills/Preferences

Experience with AI/ML in operations (AIOps) for anomaly detection, predictive scaling, or automated incident triage.
Contribution to open-source projects or thought leadership in SRE/DevOps communities.
Knowledge of Agile principles and frameworks (e.g., Scrum, SAFe), including related tools (such as Jira).
Excellent analytical, problem-solving, and investigative skills.
Strong communication and collaboration skills.

Additional Information

Availability to work flexible work hours is/may be required. This team will support continuous operations across two shifts and therefore, this role will require non-standard work hours, and some work on weekends and holidays. Appropriate adjustments in benefits will be provided for employees working non-standard hours where applicableLilly is dedicated to helping individuals with disabilities to actively engage in the workforce, ensuring equal opportunities when vying for positions. If you require accommodation to submit a resume for a position at Lilly, please complete the accommodation request form (https://careers.lilly.com/us/en/workplace-accommodation) for further assistance. Please note this is for individuals to request an accommodation as part of the application process and any other correspondence will not receive a response.Lilly does not discriminate on the basis of age, race, color, religion, gender, sexual orientation, gender identity, gender expression, national origin, protected veteran status, disability or any other legally protected status.#WeAreLilly

More Jobs at Eli Lilly and Company

Territory Manager- CardioMetabolic Health

Mumbai, Maharashtra, India

2 - 5 yrs

Salary: Not disclosed

Manager/ Sr. Manager E&C

Gurgaon, Haryana, India

Experience: Not specified

Salary: Not disclosed

Sr. Manager, Omnichannel Engagement

Gurgaon, Haryana, India

Experience: Not specified

Salary: Not disclosed

LCCI Hyderabad – Tech@Lilly OCM and Onboarding Lead

Gurgaon, Haryana, India

3 - 5 yrs

Salary: Not disclosed

Process Manager for Incident and Major Incident

Hyderabad, Telangana, India

Experience: Not specified

Salary: Not disclosed

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

Eli Lilly and Company

Login to

Please Verify Your Phone or Email

Confirm Action

Principal Site Reliability Engineer