Posted:3 days ago|
Platform:
Work from Office
Full Time
Job Overview
We are hiring a seasoned Site Reliability Engineer with strong experience in building and operating scalable systems on Google Cloud Platform (GCP). You will be responsible for ensuring system availability, performance, and security in a complex microservices ecosystem, while collaborating cross-functionally to improve infrastructure reliability and developer velocity.
Key Responsibilities
- Design and maintain highly available, fault-tolerant systems on GCP using SRE best practices.
- Implement SLIs/SLOs, monitor error budgets, and lead post-incident reviews with RCA documentation.
- Automate infrastructure provisioning (Terraform/Deployment Manager) and CI/CD workflows.
- Operate and optimize Kubernetes (GKE) clusters including autoscaling, resource tuning, and HPA policies.
- Integrate observability across microservices using Prometheus, Grafana, Stackdriver, and OpenTelemetry.
- Manage and fine-tune databases (MySQL/Postgres/BigQuery/Firestore) for performance and cost.
- Improve API reliability and performance through Apigee (proxy tuning, quota/policy handling, caching).
- Drive container best practices including image optimization, vulnerability scanning, and registry hygiene.
- Participate in on-call rotations, capacity planning, and infrastructure cost reviews.
Must-Have Skills
- Minimum 8 years of total experience, with at least 3 years in SRE, DevOps, or Platform Engineering roles.
- Strong expertise in GCP services (GKE, IAM, Cloud Run, Cloud Functions, Pub/Sub, VPC, Monitoring).
- Advanced Kubernetes knowledge: pod orchestration, secrets management, liveness/readiness probes.
- Experience in writing automation tools/scripts in Python, Bash, or Go.
- Solid understanding of incident response frameworks and runbook development.
- CI/CD expertise with GitHub Actions, Cloud Build, or similar tools.
Good to Have
- Apigee hands-on experience: API proxy lifecycle, policies, debugging, and analytics.
- Database optimization: index tuning, slow query analysis, horizontal/vertical sharding.
- Distributed monitoring and tracing: familiarity with Jaeger, Zipkin, or GCP Trace.
- Service Mesh (Istio/Linkerd) and secure workload identity configurations.
- Exposure to BCP/DR planning, infrastructure threat modeling, and compliance (ISO/SOC2).
Educational & Certification Requirements
- B.Tech / M.Tech / MCA in Computer Science or equivalent.
- GCP Professional Cl
LANDMARK GROUP
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Practice Python coding challenges to boost your skills
Start Practicing Python Now27.5 - 35.0 Lacs P.A.
Hyderābād
6.75 - 10.0 Lacs P.A.
5.0 - 10.0 Lacs P.A.
6.0 - 10.0 Lacs P.A.
Bengaluru
6.0 - 11.0 Lacs P.A.
Kolkata, Mumbai, New Delhi, Hyderabad, Pune, Chennai, Bengaluru
6.0 - 11.0 Lacs P.A.
Bengaluru
7.0 - 9.0 Lacs P.A.
7.0 - 9.0 Lacs P.A.
7.0 - 12.0 Lacs P.A.
7.0 - 12.0 Lacs P.A.