Lead Site reliability Engg (GCP Ops, Terraform,Python,GitHub Actions)

10 - 16 years

22 - 37 Lacs

Posted:1 week ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Position Overview:

Key Responsibilities:

  • Design, deploy, and maintain Kubernetes-based infrastructure to ensure high availability and scalability of applications.
  • Build and manage CI/CD pipelines using GitHub Actions to enable fast and reliable deployments.
  • Use Terraform to provision and manage infrastructure in Google Cloud Platform (GCP).
  • Manage and optimize Apache Kafka-based systems to ensure reliable message streaming and data processing.
  • Monitor and improve system performance and reliability using Prometheus and Grafana.
  • Collaborate with developers to automate workflows and implement best practices for infrastructure-as-code (IaC).
  • Write Python scripts for automation and tooling to enhance operational efficiency.
  • Troubleshoot and resolve system issues to minimize downtime and impact on users.
  • Participate in on-call rotations and incident response to ensure high service reliability.

Required Skills & Qualifications:

  • Familiarity with Google Cloud Platform (GCP) services such as Compute Engine, Kubernetes Engine, and Cloud Storage.
  • Hands-on experience with Kubernetes for deploying and managing containerized applications.
  • Understanding of GitHub Actions for creating and maintaining CI/CD pipelines.
  • Basic to intermediate knowledge of Terraform for infrastructure provisioning and management.
  • Proficiency in Python for scripting, automation, and tooling.
  • Experience with Apache Kafka for building, maintaining, and troubleshooting message-driven systems.
  • Experience using Prometheus and Grafana for monitoring and observability.
  • Strong problem-solving skills and an eagerness to learn new technologies.
  • Excellent communication and teamwork skills.

Nice-to-Have Skills (Optional):

  • Familiarity with other cloud providers (e.g., AWS or Azure).
  • Knowledge of Helm for Kubernetes package management.
  • Experience with debugging and optimizing distributed systems.
  • Exposure to security best practices for cloud infrastructure.
  • Knowledge of Java for developing and troubleshooting backend systems.
  • Familiarity with

    DataHub

    or similar data cataloging and metadata management platforms.
  • Understanding of Artificial Intelligence (AI) concepts and tools, such as building or managing machine learning pipelines, integrating AI models, or working with ML platforms like TensorFlow, PyTorch, or Vertex AI.
  • Experience with

    Golang

    for developing infrastructure tools or cloud-native applications.

Education & Experience:

  • Bachelor's degree in Computer Science, Information Technology, or related field (or equivalent work experience).
  • 1-3 years of experience in DevOps, SRE, or related roles (internships and project experience are acceptable for entry-level candidates).

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
Optum logo
Optum

Hospitals and Health Care

Eden Prairie MN

RecommendedJobs for You