Jobs

Interviews
Job Alerts
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Home
>
Jobs in chandigarh
>
PaladinAi
>
Ai lead

Ai lead

PaladinAi

3 years

0 Lacs

chandigarh chandigarh india

Posted:1 week ago| Platform:

Apply

Skills Required

ai ml devops design pytorch analyze data correlation power diagnostics layers opentelemetry datadog automation auto inference server kafka spark engineering mlflow airflow model monitoring drift programming python aws gcp azure kubernetes docker code terraform helm rabbitmq analysis orchestration workflow reasoning scaling collaborative compensation development

Work Mode

On-site

Job Type

Full Time

Job Description

AIOps Lead

Location: Chandigarh (On-site)

Experience: 3 to 5 years (AI/ML + DevOps + Observability)

Employment Type: Full-time

About the Role

We are looking for a next-generation AIOps Engineer to design and operate AI-driven, self-healing, and intelligent infrastructure systems.

In this role, you’ll fuse MLOps, DevOps, and agentic AI systems — leveraging technologies like Ray, vLLM, SGLang, and PyTorch Lightning to build predictive, autonomous, and scalable operational pipelines.

You will develop intelligent observability systems capable of detecting, diagnosing, and resolving issues in real time — powered by distributed AI and LLM-based automation.

Key Responsibilities

• Design, implement, and scale AIOps pipelines that collect, analyze, and act on telemetry data across infrastructure and applications.

• Build and deploy distributed ML/LLM workflows using Ray, PyTorch Lightning, vLLM, or SGLang for anomaly detection, event correlation, and predictive maintenance.

• Orchestrate LLM-based operations agents using LangChain, LangGraph, or SGLang to power AI-assisted diagnostics and root-cause analysis.

• Implement intelligent observability layers over systems like Prometheus, Grafana, ELK, OpenTelemetry, or Datadog to enable AI-driven insights and alerting.

• Develop self-healing systems leveraging AI and automation frameworks to auto-remediate incidents.

• Optimize inference serving and distributed compute with vLLM, Ray Serve, and Triton Inference Server for ultra-fast response times.

• Build real-time data ingestion pipelines using Kafka, Spark, or Flink for operational and telemetry data.

• Collaborate with SRE, MLOps, and AI engineering teams to create autonomous, adaptive infrastructure systems.

• Integrate CI/CD pipelines for AI workflows using MLflow, Kubeflow, or Airflow, with model monitoring and drift detection.

• Evaluate and integrate AIOps platforms (Moogsoft, BigPanda, Datadog AIOps, Dynatrace, etc.) and agentic frameworks for proactive automation.

Required Skills & Qualifications

• Bachelor’s or Master’s in Computer Science, Engineering, or related field.

• 4+ years of experience in DevOps, SRE, or AI infrastructure engineering.

• Strong programming experience in Python (preferred), Go, or Bash scripting.

• Deep understanding of cloud platforms (AWS, GCP, Azure) and Kubernetes/Docker orchestration.

• Expertise in infrastructure as code (Terraform, Helm, Pulumi).

• Experience with distributed compute frameworks — Ray, PyTorch Lightning, vLLM, SGLang.

• Proficiency with observability and monitoring stacks (Prometheus, Grafana, ELK, OpenTelemetry, Splunk).

• Familiarity with MLOps and LLMOps tools (MLflow, Kubeflow, Airflow, ArgoCD).

• Experience with event-driven systems and message queues (Kafka, RabbitMQ, AWS SQS).

• Understanding of AI-powered automation, root cause analysis, and predictive operational analytics.

Preferred / Nice-to-Have

• Hands-on with vLLM for optimized LLM inference and observability agents.

• Experience deploying and optimizing Ray Serve, vLLM, or Triton in production.

• Exposure to SGLang for LLM-based orchestration, workflow automation, and diagnostics reasoning.

• Familiarity with vector databases (Milvus, Weaviate, Pinecone) and RAG-based observability.

• Experience with agentic AIOps frameworks and LLM-driven operational reasoning (LangGraph, AutoGen, CrewAI).

• Understanding of AI observability, drift detection, cost-aware scaling, and fault-tolerant AI systems.

• Contributions to open-source AIOps, observability, or distributed AI infrastructure projects.

What We Offer

• Opportunity to build the foundation for autonomous, intelligent operations.

• Hands-on exposure to SGLang, vLLM, Ray, PyTorch Lightning, and LangGraph ecosystems.

• Collaborative, cross-functional environment spanning AI, cloud, and systems engineering.

• Competitive compensation, flexible work setup, and professional development opportunities.

More Jobs at PaladinAi

Cyber Security Analyst

India

Experience: Not specified

Salary: Not disclosed

Technical SEO Executive

Chandigarh, Chandigarh, India

1.0 - 3.0 yrs

Salary: Not disclosed

Full Stack Engineer

Chandigarh, India

5.0 - 5.0 yrs

Salary: Not disclosed

Marketing Intern

Chandigarh, Chandigarh, India

Experience: Not specified

Salary: Not disclosed

Search Engine Optimization Executive

Chandigarh, Chandigarh, India

Experience: Not specified

Salary: Not disclosed

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

PaladinAi

Login to

Please Verify Your Phone or Email

Confirm Action

Ai lead