Site Reliability Engineering - GCP

5 - 10 years

6 - 9 Lacs

Posted:1 week ago| Platform: GlassDoor logo

Apply

Work Mode

On-site

Job Type

Part Time

Job Description

EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.

We are seeking a Site Reliability Engineer with expertise in Google Cloud Platform to join our multi-functional SRE team.

You will focus on enhancing operational automation and monitoring to improve efficiency within our cloud environments. Your role involves identifying repetitive tasks and implementing automated solutions to minimize manual effort. If you have a strong background in cloud engineering and are driven to optimize system reliability, we invite you to apply and contribute to our team.

Responsibilities

  • Act as subject matter expert for operation automation and monitoring on Google Cloud Platform
  • Identify toil in existing systems and processes and recommend solutions to reduce manual tasks
  • Design and implement automated workflows to improve team efficiency
  • Define and create customer user journeys, service level objectives, service level indicators, and error budgeting based on non-functional requirements
  • Develop and maintain infrastructure as code using Terraform and GitHub
  • Write and maintain scripts using Bash, PowerShell, Python, and Ansible to support automation
  • Manage containerized environments using Kubernetes
  • Collaborate with team members to reduce toil in software development life cycle and IT operations
  • Utilize source control management tools including Git, GitHub, and SonarQube
  • Apply understanding of IT service management processes to support operational excellence
  • Monitor and analyze system performance metrics using Prometheus and Grafana
  • Provide proactive and analytical insights to improve system reliability

Requirements

  • Experience of 5 to 10 years in site reliability engineering or related cloud engineering roles
  • Strong knowledge of Google Cloud Platform and cloud engineering practices
  • Expertise in defining and implementing customer user journeys, service level objectives, service level indicators, and error budgeting
  • Proficiency in infrastructure as code tools such as Terraform and GitHub
  • Skills in scripting languages, including Bash, PowerShell, Python, and Ansible
  • Competency in container orchestration with Kubernetes
  • Experience designing and implementing automated workflows to reduce manual effort
  • Familiarity with source control management tools like Git, GitHub, and SonarQube
  • Understanding of IT service management processes
  • Analytical and proactive mindset to identify and solve operational challenges

We offer

  • Opportunity to work on technical challenges that may impact across geographies
  • Vast opportunities for self-development: online university, knowledge sharing opportunities globally, learning opportunities through external certifications
  • Opportunity to share your ideas on international platforms
  • Sponsored Tech Talks & Hackathons
  • Unlimited access to LinkedIn learning solutions
  • Possibility to relocate to any EPAM office for short and long-term projects
  • Focused individual development
  • Benefit package:
    • Health benefits
    • Retirement benefits
    • Paid time off
    • Flexible benefits
  • Forums to explore beyond work passion (CSR, photography, painting, sports, etc.)

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You