Home
Jobs

Site Reliability Engineer - Python, GCP

3 - 5 years

13 - 15 Lacs

Posted:3 months ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Design, implement, and maintain scalable and highly reliable cloud infrastructure using Google Cloud Platform (GCP) services such as Compute Engine, Kubernetes Engine, Cloud Functions, and BigQuery. Write Python scripts to automate operations, deployment processes, and enhance system performance. Collaborate with engineering teams to improve system architecture, application deployment, and continuous integration/continuous deployment (CI/CD) pipelines. Develop and maintain system observability frameworks including logs, metrics, and tracing to ensure visibility into system health and performance. Implement and manage monitoring and ing systems using tools like Prometheus, Grafana, or Stackdriver to ensure system reliability and uptime. Participate in on-call rotations to address production incidents and drive incident management and root cause analysis. Work on improving system performance, cost management, and security using GCP-native tools. Define and track SLOs (Service Level Objectives) and SLIs (Service Level Indicators) to ensure that systems meet reliability targets. Automate and streamline processes for system provisioning, configuration, and deployment. Conduct post-incident reviews to identify areas for improvement and prevent recurrence of issues. 4+ years of experience in Site Reliability Engineering (SRE), DevOps, or similar roles. Strong experience with Python programming, including automation, scripting, and system management tools. Hands-on experience with Google Cloud Platform (GCP) services, such as Compute Engine, Kubernetes Engine, Cloud Functions, and BigQuery. Strong understanding of containerization and orchestration tools, particularly Docker and Kubernetes. Proficiency in monitoring and ing tools, such as Prometheus, Grafana, Stackdriver, or similar. Experience working with CI/CD tools and practices (e.g., GitLab, Jenkins). Solid understanding of system performance optimization, security, and cost management practices on GCP. Strong knowledge of networking concepts, high-availability architectures, and system troubleshooting techniques. Experience with infrastructure automation and configuration management tools (e.g., Terraform, Ansible). Experience in production environment management, incident resolution, and on-call support. Good understanding of software development practices and collaboration with development teams to improve reliability.

Mock Interview

Practice Video Interview with JobPe AI

Start Environment Management Interview Now

My Connections UST

Download Chrome Extension (See your connection in the UST )

chrome image
Download Now
UST
UST

IT Services and IT Consulting

Aliso Viejo CA

10001 Employees

1845 Jobs

    Key People

  • Kris Canekeratne

    Co-Founder & CEO
  • Sandeep Reddy

    President

RecommendedJobs for You

Bengaluru / Bangalore, Karnataka, India

Hyderabad / Secunderabad, Telangana, Telangana, India

Noida, Uttar Pradesh, India

Patan - Gujarat, Gujrat, India