Posted:2 weeks ago| Platform:
On-site
Full Time
Job Description We are looking for a highly experienced Site Reliability Engineer (SRE) with deep expertise in Google Cloud Platform (GCP) and a strong background in building and maintaining reliable, scalable systems with min 8 years of experience. Key Responsibilities: Design, implement, and maintain cloud infrastructure on GCP and Azure. Build and manage observability solutions using Grafana, Prometheus, Loki, and Tempo. Develop automation tools and scripts using Python to improve system reliability and reduce manual effort. Support Microsoft Azure applications. Deploy and manage containerized applications using Kubernetes and other containers. Participate in on-call rotations, incident response, and root cause analysis. Define and monitor SLAs and manage error budgets to ensure service reliability. Collaborate with development and operations teams to improve system performance and supportability. Required Qualifications: 8+ years of experience in SRE or Cloud Engineering roles. Strong hands-on experience with GCP services and architecture. Proficiency in Python for scripting and automation. Experience with observability tools: Grafana, Prometheus, Loki, Tempo. Solid understanding of Kubernetes and container orchestration. Experience in Azure cloud. Strong troubleshooting and problem-solving skills. Excellent communication and collaboration abilities. Preferred/Added Advantage: Experience in support operations, including incident management. GCP certifications. Familiarity with CI/CD pipelines and DevOps best practices. Show more Show less
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Noida, Uttar Pradesh, India
0.0 - 0.0 Lacs P.A.