AIOps Engineer

3 years

8 - 10 Lacs

Posted:1 day ago| Platform: GlassDoor logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

DevOps/AIOps Engineer (Platform)

Experience: 3–5 Years

About the Company

We aim to bring about a new paradigm in medical image diagnostics — intelligent, holistic, ethical, explainable, and patient‑centric. We’re looking for innovative problem‑solvers who empathize with clinicians and patients, understand business problems, and can design and deliver reliable, intelligent products.

Key Responsibilities

  • CI/CD for services & models: Own pipelines (GitHub Actions/GitLab CI), environment gates, artifact/version governance (containers, models, SBOMs), safe rollouts & instant rollbacks.
  • Kubernetes platform (EKS preferred): Operate multi-env clusters; Helm/Kustomize; GitOps (Argo CD/Flux); progressive delivery (canary/blue-green/Argo Rollouts/Flagger).
  • Serving & APIs: Deploy and tune FastAPI services and Triton/ONNX/TensorRT inference; traffic shaping, runtime config, autoscaling signals.
  • Event-driven orchestration: Build robust consumers/producers on RabbitMQ/ActiveMQ/Kafka with back-pressure, dead-lettering, idempotency, and retry patterns.
  • Observability & AIOps: Define SLIs/SLOs and error budgets; metrics/logs/traces (Prometheus/Grafana/Loki/Tempo/ELK); intelligent alerting & noise reduction; basic model/data drift hooks.
  • Security in SDLC: Supply-chain security (image signing/provenance, SBOM scans), SAST/DAST/IaC scanning, policy-as-code (OPA/Gatekeeper), secrets hygiene in pipelines/workloads.
  • Data/Model platform integration: S3/MinIO for artifacts; integrate model registry (MLflow or similar) into CD; immutable, traceable releases.
  • Resilience & performance: Capacity planning (incl. GPU), autoscaling (HPA/VPA/KEDA), caching/queue tuning; chaos/game-days; write runbooks and own incident response for platform services.
  • Developer experience: Golden paths, starter repos, internal Helm charts, docs & enablement to make shipping boring and fast.
  • FinOps mindset: Cost dashboards, right-sizing, bin-packing, GPU utilization policies, spot vs on-demand strategy.

Skills and Qualifications (Required)

  • 3+ years in DevOps/SRE/MLOps with strong Docker & Kubernetes fundamentals.
  • Production CI/CD expertise; canary/blue-green; artifact & version management.
  • IaC (Terraform) and GitOps workflows (Argo CD/Flux).
  • Observability: Prometheus/Grafana; logs/traces with Loki/Tempo/ELK.
  • Production message queues (RabbitMQ/ActiveMQ/Kafka) with back-pressure & retries.
  • Cloud experience (AWS/GCP/Azure), EKS preferred; object storage (S3/MinIO); model registries (MLflow or similar).
  • Security in SDLC and compliance guardrails for PHI-like data (least-privilege IAM, secrets, auditability).
  • Incident response experience; writing SLIs/SLOs, runbooks, and operating to error budgets.
  • Scripting for platform tasks (Python/Bash).

Preferred

  • Triton Inference Server, ONNX/TensorRT optimizations; GPU scheduling on K8s (NVIDIA device plugin, MIG, node pools).
  • Argo Rollouts/Flagger, Karpenter, KEDA; caching layers (Redis/NVCache patterns).
  • Policy-as-code (OPA/Gatekeeper), image signing (cosign), SBOM tools (syft/grype).
  • Network savvy for app delivery (ingress, service meshes, egress policies).

Education

BE/B.Tech or equivalent experience.

Location & Work Setup

On-site - Gurugram

Job Type: Full-time

Pay: ₹800,000.00 - ₹1,000,000.00 per year

Application Question(s):

  • How many years of hands on production kubernetes experience do you have?
  • Which CI/CD tools have you used to deploy micro services?
  • Do you have experience with progressive delivery and roll backs?
  • What is your experience with serving ML Models in production?
  • How do you evaluate and track the performance of a Model in production?

Work Location: In person

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You

Chennai, Perungudi