DevOps Manager

0 years

0 Lacs

Posted:12 hours ago| Platform: Linkedin logo

Apply

Work Mode

Remote

Job Type

Full Time

Job Description

Position:

Location:

Employment Type:

Schedule:


Company Description

Scry AI is a research-led enterprise AI company that builds intelligent platforms to drive efficiency, insight, and compliance. Our platforms Collatio®, Auriga®, and Concentio® streamline complex workflows by automating data extraction, validation, reconciliation and delivering real-time intelligence.


We are seeking a DevOps Manager to lead our infrastructure, CI/CD, and reliability practices across cloud and on-prem deployments. You will own uptime, performance, security, and cost efficiency for AI/ML workloads powering Collatio®, Auriga®, and Concentio®.


Role Overview

As DevOps Manager, you will lead a small team of DevOps/SRE engineers to design, automate, and operate secure, compliant, and highly available platforms across AWS/Azure/GCP and customer on-prem environments. You will standardize IaC, improve CI/CD velocity, build robust observability, and enable GPU-accelerated AI inference at scale for enterprise clients.


Key Responsibilities

Platform Reliability & Operations

• Own SLOs/SLIs, availability, latency, and capacity planning across services.

• Lead incident response, root-cause analysis, postmortems, and on-call processes.

• Implement backup, disaster recovery, and business continuity for multi-region and on-prem.


Cloud, On-Prem & Edge Deployments

• Architect Kubernetes platforms (managed and self-hosted), including RBAC, network policies, and secrets management.

• Standardize infrastructure with Terraform, Helm, and GitOps (Argo CD) for repeatable customer deployments.

• Support Concentio® edge/IoT rollouts with secure remote updates and telemetry pipelines.


AI/ML & Data Infrastructure

• Enable GPU scheduling and drivers (CUDA, NVIDIA), inference runtimes (Triton), and model packaging.

• Build MLOps foundations (MLflow, feature stores) and artifact/version governance.

• Operate data services (Kafka, PostgreSQL, Redis, MinIO/S3, Elasticsearch/Opensearch) for high-throughput pipelines.


CI/CD & Developer Experience

• Own CI/CD with GitHub Actions/GitLab CI/Jenkins; establish trunk-based development, automated testing, and canary/blue-green releases.

• Maintain internal developer platforms, templates, and golden paths to improve delivery speed and quality.


Security, Compliance & Observability

• Implement least-privilege access, SSO (Okta/AAD), Vault-based secrets, image scanning (Trivy), and policy as code.

• Ensure SOC 2, ISO 27001, HIPAA/GDPR alignment with audit trails and immutable logs.

• Build end-to-end observability using Prometheus, Grafana, Loki/EFK, and OpenTelemetry.


FinOps & Stakeholder Management

• Track cloud spend, rightsize resources, and negotiate quotas for GPU/compute.

• Partner with Product, Data Science, and Customer Success to plan capacity for new features and enterprise go-lives.


Required Qualifications & Skills

• Strong Kubernetes expertise (production operations, networking, security, Helm, GitOps).

• Proven IaC experience with Terraform and configuration management (Ansible).

• CI/CD at scale with GitHub Actions/GitLab CI/Jenkins; artifact registries and SBOMs.

• Observability: Prometheus, Grafana, ELK/EFK or Loki, alerting and runbooks.

• Cloud proficiency in at least one major provider (AWS/Azure/GCP) and Linux fundamentals.

• Security fundamentals: network segmentation, TLS, secrets management, container hardening.

• Experience running data/streaming systems (Kafka, Redis, PostgreSQL) in production.

• Excellent communication, incident leadership, and stakeholder management.


Nice-to-Have

• GPU orchestration, Triton Inference Server, Hugging Face model serving.

• Service mesh (Istio/Linkerd), API gateways, and zero-trust patterns.

• MLOps tooling (MLflow, Feast), Airflow, dbt.

• Compliance implementations for regulated industries (BFSI, healthcare).

• Certifications: CKA/CKAD, AWS/Azure/GCP Architect, Security+.


Our Ideal Candidate

• Drives reliability with automation, not toil.

• Balances speed and safety with measurable delivery improvements.

• Thrives in customer-facing, hybrid cloud, and on-prem environments.

• Coaches teams with clear standards, runbooks, and continuous improvement.


Tip for Candidates

If you want to build secure, high-performance platforms for real-world AI at enterprise scale, follow our page for more such relevant job openings.

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You

gurugram, haryana, india

pune, maharashtra, india

chennai, tamil nadu, india

hyderabad, telangana, india

hyderabad, telangana, india

pune, maharashtra, india