Lead Site Reliability Engineer

7 - 14 years

0 Lacs

Posted:5 days ago| Platform: GlassDoor logo

Apply

Work Mode

On-site

Job Type

Part Time

Job Description

EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.

We are seeking a skilled Lead Site Reliability Engineer to drive the stability, scalability, and reliability of our systems while improving efficiency through automation and best practices.

This role calls for deep expertise in DevOps methodologies, Infrastructure as Code (IaC), and collaboration across teams to ensure optimal system performance.

Responsibilities

  • Guide teams in designing, building, testing, and deploying changes to existing software
  • Enhance infrastructure security protocols in alignment with industry standards
  • Identify and implement automation solutions to replace manual processes
  • Evaluate IT infrastructure across cloud and on-premises environments, optimizing for performance and reliability
  • Collaborate with teams to ensure smooth integration and deployment processes
  • Oversee the implementation of CI/CD pipelines for seamless software delivery
  • Drive the adoption of Infrastructure as Code tools like Terraform for consistent environment provisioning
  • Establish monitoring and alerting systems to proactively address system issues
  • Lead incident management efforts to resolve and prevent recurring outages
  • Provide mentorship and technical guidance to team members in implementing best practices

Requirements

  • 7-14 years of relevant experience in Site Reliability Engineering or similar roles
  • Proficiency in CI/CD practices and automation frameworks
  • Strong knowledge of cloud platforms, such as Google Cloud Platform, and Linux-based systems
  • Expertise in Infrastructure as Code tools like Terraform for managing infrastructure automation
  • Background in Oracle RDBMS and Site Reliability Engineering principles
  • Knowledge of configuration management tools like Ansible and containerization technologies such as Docker
  • Familiarity with scripting languages for task automation (e.g., Python, Shell)
  • Capability to use Jenkins for continuous integration and deployment workflows

We offer

  • Opportunity to work on technical challenges that may impact across geographies
  • Vast opportunities for self-development: online university, knowledge sharing opportunities globally, learning opportunities through external certifications
  • Opportunity to share your ideas on international platforms
  • Sponsored Tech Talks & Hackathons
  • Unlimited access to LinkedIn learning solutions
  • Possibility to relocate to any EPAM office for short and long-term projects
  • Focused individual development
  • Benefit package:
    • Health benefits
    • Retirement benefits
    • Paid time off
    • Flexible benefits
  • Forums to explore beyond work passion (CSR, photography, painting, sports, etc.)

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You