2 years

3 - 7 Lacs

Posted:11 hours ago| Platform: GlassDoor logo

Apply

Work Mode

On-site

Job Type

Part Time

Job Description

Role Overview:
Experienced DevOps engineer who can own and scale production infrastructure end-to-end - from CI/CD and IaC to observability and incident response. You’ll lead design docs, harden reliability and security, drive cost/perf efficiency.

What You’ll Do

  • Architect and maintain CI/CD pipelines (build, test, security scans, deploy, rollback) with quality gates and environment promotions.
  • Design and operate container platforms (ECS/EKS or equivalent), service discovery, blue/green & canary strategies, and autoscaling.
  • Implement Infrastructure as Code (Terraform/CDK/CloudFormation), enforce modular, reviewable, and drift-free infra.
  • Build observability: metrics/logs/traces, SLOs/SLIs, dashboards, and actionable alerts; reduce MTTR through runbooks and automation.
  • Champion platform reliability: capacity planning, HA/DR (multi-AZ), backup/restore testing, change management.
  • Own secrets management, IAM least-privilege, network policies, and baseline hardening (CIS where relevant).
  • Drive cost optimization (rightsizing, autoscaling policies, savings plans/spot, storage lifecycle) with monthly reporting.
  • Establish release/incident processes (postmortems, RCAs) and lead remediation to cut change failure rate.
  • Partner with Backend/AI/Frontend teams to productize models/services (GPU pools, batching, caching layers) and streamline developer workflows.
  • lead design reviews, tech spikes, Monitoring and documentation.

Technical Qualifications

  • 2-3+ years in DevOps/SRE/Platform roles supporting production systems at scale.
  • Strong with AWS : VPC, IAM, ECS/EKS, ALB/NLB, RDS/Elasticache/Object storage, CloudWatch.
  • Proficient in Terraform (or CDK/CloudFormation), CI/CD (GitHub/GitLab/Jenkins/Argo) including artifacts and environment promotion.
  • Containers & orchestration: Docker, task definitions/helm charts, autoscaling, health checks, readiness/liveness.
  • Observability: Prometheus/Grafana, OpenTelemetry, log pipelines (ELK/CloudWatch/Datadog), alert routing.
  • Networking & security: VPC/Subnets, SGs/NACLs, TLS, DNS, WAF, IAM design, secrets (KMS/Parameter Store/Vault).
  • Scripting/automation in Python/Bash, configuration management (Ansible or equivalent).
  • Proven incident management: on-call practice, runbooks, RCAs, tuning alerts to reduce noise.

Nice to Have

  • Kubernetes (EKS) production experience, service mesh (Istio/Linkerd), GitOps (ArgoCD/Flux).
  • Image and dependency security (Trivy/Grype/Snyk), SBOMs, policy-as-code (OPA/Conftest).
  • Data platform ops (Mysql/Postgres/PITR, replicas), streaming (Kafka/Kinesis).
  • All the corresponding services in azure

Startup-Specific Expectations

  • Be comfortable with ambiguity and a fast-paced, evolving environment.
  • Proactively take on varied technical tasks outside your comfort zone.
  • Help reduce operational toil via automation and smarter tooling.
  • Contribute ideas on performance, cost savings, and process improvements.

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You

gurugram, haryana, india