Jobs

Interviews
Job Alerts
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Home
>
Jobs in hyderabad
>
Amgen Inc
>
Sr. Site Reliability Engineer

Sr. Site Reliability Engineer

Amgen Inc

8 - 13 years

8 - 12 Lacs

hyderabad

Posted:1 month ago| Platform:

Apply

Skills Required

linux system administration python golang networking bash kubernetes information technology ci/cd cloud platforms ansible process management incident management system performance terraform aws

Work Mode

Work from Office

Job Type

Full Time

Job Description

What you will do

In this vital role you will play a key role in building, scaling, and securing the platforms that underpin Amgens global digital initiatives. This role focuses on ensuring the reliability, performance, and efficiency of cloud-native platforms while enabling development velocity and operational excellence.

You will be responsible for designing and operating infrastructure and shared platforms used across the enterprise, including CI/CD, observability, incident management, and collaboration systems.

You will work extensively with containerized environments, handle multi-tenant Kubernetes platforms, and automate processes to improve resilience and reduce operational burden. This role requires deep technical depth, leadership skills, and the ability to drive initiatives across cross-functional teams and global stakeholders.

Roles & Responsibilities:

Platform Reliability Engineering

Design, operate, and scale secure, highly available cloud-based infrastructure using Infrastructure as Code (IaC).
Handle multi-tenant container orchestration environments with advanced access controls, workload isolation, and governance policies.
Ensure enterprise CI/CD platforms are performant, secure, and optimized for high-throughput engineering teams.

Monitoring, Observability & Incident Management

Build and handle observability platforms for full-stack visibility, leveraging metrics, logs, and traces.
Define, implement, and continuously refine SLIs, SLOs, and error budgets for platform health and service performance.
Automate incident response workflows, integrate with incident management platforms, and lead post-incident reviews and root cause analysis.
Enterprise Platform Administration
Operate and improve core engineering platforms (e.g., CI/CD, collaboration, knowledge sharing) to ensure availability, security, and ease of use.
Automate platform provisioning, upgrades, access controls, and integration pipelines to reduce manual effort and improve consistency.
Implement compliance, audit logging, and policy enforcement through code-driven governance models.

AI Adoption & Enablement

Drive the adoption of AI/ML-based tools to enhance observability, incident prediction, remediation, and intelligent alerting.
Evaluate and integrate AI-assisted automation platforms to reduce toil and improve operational efficiency.
Partner with platform, security, and development teams to embed predictive analytics into dashboards, workflows, and root cause tooling.
Champion a data-driven SRE practice by enabling thoughtful insights and anomaly detection across systems and platforms.

Leadership & Collaboration

Serve as a technical thought leader and mentor within the SRE organization.
Promote SRE principles and reliability culture across engineering teams.
Collaborate with cross-functional stakeholders to influence architecture, roadmaps, and platform investment.
Lead operational reviews and service health retrospectives, with a focus on continuous improvement.
Participate in Agile and SAFe delivery processesincluding sprint planning, stand-ups, retrospectives, and PI planningto ensure security and platform reliability are embedded across development cycles.

Basic Qualifications:

Doctorate degree / Master's degree / Bachelor's degree and 8 to 13 years in Computer Science, Information Technology, or a related technical field
Demonstrated success operating cloud-native infrastructure in production environments
Practical experience handling Kubernetes clusters and CI/CD environments at enterprise scale
Exposure to global on-call or incident support rotations
Excellent collaboration and communication skills across technical and non-technical teams

Preferred Qualifications:

Must-Have Skills:

Deep experience with cloud platforms (AWS, Azure, or GCP), including services such as compute, networking, IAM, and VPC design
Proven proficiency in Infrastructure as Code (IaC) using tools such as Terraform or CloudFormation
Advanced skills in managing container orchestration platforms (e.g., Kubernetes), including workload isolation, resource quotas, and role-based access control
Strong understanding of Linux system administration , process management, and system performance tuning
Hands-on experience with CI/CD platforms and pipelines (build automation, artifact storage, environment provisioning, rollback strategies)
Strong background in observability tooling , including Prometheus , Grafana , Dynatrace , and distributed tracing frameworks like OpenTelemetry or Jaeger
Strong practical experience with incident management platforms and practices (e.g., alert routing, runbooks, escalation paths)
Automation and scripting proficiency in languages such as Python , Go , or Bash
Experience with configuration management tools like Ansible , Chef , or SaltStack
Strong grasp of networking fundamentals , such as routing, DNS, OSI layers, load balancing, firewalls, TLS, and security groups
Version control and collaboration workflows using Git and GitOps principles
Experience with enterprise collaboration platforms , including provisioning, integration, and permission control

Good-to-Have Skills:

Exposure to service mesh technologies (e.g., Istio, Linkerd) and zero-trust network concepts
Familiarity with secrets management platforms (e.g., HashiCorp Vault, AWS Secrets Manager)
Experience using incident response and chaos engineering tools (e.g., Gremlin, Chaos Mesh)
Background in cost optimization , budgeting, and resource tracking (FinOps)
Awareness of policy-as-code frameworks (e.g., OPA, Kyverno)
Familiarity with feature flagging and progressive delivery tools (e.g., LaunchDarkly, Argo Rollouts)
Integration experience with ticketing and change management platforms (e.g., ServiceNow, Jira)
Understanding of compliance standards (e.g., HIPAA, GDPR, SOC 2) and how they apply to infrastructure operations
Understanding of security and encryption technologies and authentication protocols such as OpenID, OIDC, OAuth, SAML, and LDAP

Professional Certifications (Preferred)

Cloud DevOps Certification (AWS/Azure/GCP)
Certified Kubernetes Administrator (CKA) or Security Specialist (CKS)
CI/CD Platform Certification
ITIL Foundation or equivalent service management certification

Soft Skills:

High level of ownership and accountability for platform reliability
Strong diagnostic and analytical capabilities with a bias for action
Clear and confident communicator with an ability to influence without authority
Passion for automation, operational excellence, and team mentorship

More Jobs at Amgen Inc

Strategic Planning & Operations Senior Manager

Hyderabad

5 - 10 yrs

INR 7 - 12 Lacs

IT Project Manager / SAFe Scrum Master

Hyderabad

5 - 6 yrs

INR 7 - 10 Lacs

CSAR Manager - SAS Edit Check Programmer

Hyderabad

9 - 12 yrs

INR 20 - 27 Lacs

Global HEOR Economic Modeling Leader

Hyderabad

4 - 8 yrs

INR 15 - 18 Lacs

Cyber and 3rd party risk manager

Hyderabad

6 - 9 yrs

INR 11 - 15 Lacs

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

Amgen Inc

Biotechnology

Thousand Oaks

Login to

Please Verify Your Phone or Email

Confirm Action

Sr. Site Reliability Engineer