Posted:1 day ago| Platform: Foundit logo

Apply

Skills Required

Work Mode

On-site

Job Type

Full Time

Job Description

Roles & Responsibilities

  • System Reliability, Performance Optimization & Cost Reduction

    Ensure the reliability, scalability, and performance of Amgen's infrastructure, platforms, and applications. Proactively identify and resolve performance bottlenecks and implement long-term fixes. Continuously evaluate system design and usage to identify opportunities for cost optimization, ensuring infrastructure efficiency without compromising reliability.
  • Automation & Infrastructure as Code ( IaC )

    Drive the adoption of automation and Infrastructure as Code ( IaC ) across the organization to streamline operations, minimize manual interventions, and enhance scalability. Implement tools and frameworks (such as Terraform, Ansible, or Kubernetes) that increase efficiency and reduce infrastructure costs through optimized resource utilization.
  • Standardization of Processes & Tools

    Establish standardized operational processes, tools, and frameworks across Amgen's technology stack to ensure consistency, maintainability, and best-in-class reliability practices. Champion the use of industry standards to optimize performance and increase operational efficiency.
  • Monitoring, Incident Management & Continuous Improvement

    Implement and maintain comprehensive monitoring, alerting, and logging systems to detect issues early and ensure rapid incident response. Lead the incident management process to minimize downtime, conduct root cause analysis, and implement preventive measures to avoid future occurrences. Foster a culture of continuous improvement by leveraging data from incidents and performance monitoring.
  • Collaboration & Cross-Functional Leadership

    Partner with software engineering, and IT teams to integrate reliability, performance optimization, and cost-saving strategies throughout the development lifecycle. Act as a SME for SRE principles and advocate for best practices for assigned Projects.
  • Capacity Planning & Disaster Recovery

    E xecute capacity planning processes to support future growth, performance, and cost management. Maintain disaster recovery strategies to ensure system reliability and minimize downtime in the event of failures.

Must-Have Skills:

  • Experience with AWS Cloud Services
  • Experience in CI/CD, IAC , Observability, Gitops (added advantage) etc
  • Exposure to containerization (Docker) and orchestration tools (Kubernetes) to optimize resource usage and improve scalability is an added advantage
  • Ability to learn new technologies quickly. Strong problem-solving and analytical skills. Excellent communication and teamwork skills.

Good-to-Have Skills:

  • Knowledge of cloud-native technologies and strategies for cost optimization in multi-cloud environments.
  • Familiarity with distributed systems, databases, and large-scale system architectures.

Databricks Knowledge/Exposure is good to have (need to upskill if hired)

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You

bengaluru, karnataka, india

bengaluru, karnataka, india

bengaluru, karnataka, india

Hyderabad, Telangana, India