Site Reliability Engineer

4 - 7 years

50 - 65 Lacs

Posted:2 days ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Position Summary

As a Site Reliability Engineer (SRE), you will ensure the reliability, scalability, and performance of Prophecy s platform across multi-cloud and SaaS environments. You will provide technical expertise in Kubernetes, networking, identity, observability, and automation, working to resolve challenges that impact the availability and resilience of our platform. Customers and internal teams will look to you for solutions ranging from infrastructure troubleshooting to complex architectural designs spanning Kubernetes, cloud-native services, and enterprise security. You will partner closely with product engineering and support teams to deliver a highly reliable experience to our enterprise customers.

The Impact You Will Have

  • Operate and optimize Kubernetes platforms (EKS, AKS, GKE) with Helm, namespaces, pods, autoscaling, node pools.
  • Manage ingress & networking: NGINX, ALB/AGIC, DNS, TLS/certificates, proxies, VNET/VPC routing, PrivateLink/peering.
  • Implement identity & secrets management: SSO (OIDC/SAML), SCIM, service principals/managed identities, vaults, key rotation.
  • Maintain platform service health across UI, APIs, orchestrators, workflow services using readiness/liveness probes and capacity planning.
  • Enable storage & I/O: object stores (S3, ADLS, GCS), DBFS mounts, IAM roles, access connectors, throughput/quota optimization.
  • Execute release & upgrades: version rollouts, canary/blue-green strategies, rollback automation, image registries, SBOM/vulnerability scanning.
  • Deliver observability: build dashboards, log pipelines, SLO/SLA monitoring with Prometheus, Grafana, CloudWatch, Log Analytics, ELK.
  • Strengthen resilience & DR: multi-AZ architectures, backup/restore, chaos testing, RTO/RPO validation, recovery runbooks.
  • Drive release automation: GitOps (ArgoCD/Flux), pre-flight checks, automated smoke tests, post-upgrade validation suites.
  • Ensure cloud-specific reliability: IAM, private connectivity, security groups, application gateways across AWS, Azure, GCP.
  • Enforce security & compliance: CIS hardening, benchmarks, network segmentation, vulnerability management, auditability.
  • Support high-governance SaaS deployments: dedicated SaaS controls, change control, strict egress policies, artifact provenance, customer-owned KMS.

What We Look For

  • 4-7 years in SRE, platform engineering, or enterprise production support.
  • Strong hands-on experience with Kubernetes and multi-cloud (AWS, Azure, GCP).
  • Expertise in networking, identity, secrets, and platform automation.
  • Proven track record in observability, reliability engineering, and incident management.
  • Familiarity with GitOps/CI/CD pipelines and modern automation practices.
  • Strong problem-solving, ownership, and ability to work in a fast-moving startup culture.
  • Technical degree or the equivalent experience.

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You

hyderabad, chennai, bengaluru

noida, new delhi, greater noida