Jobs

Interviews
Job Alerts
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Home
>
Jobs in sahibzada ajit singh nagar
>
XenonStack Moments
>
Agentic Infrastructure Observability Engineer

Agentic Infrastructure Observability Engineer

XenonStack Moments

6 years

0 Lacs

sahibzada ajit singh nagar punjab india

Posted:1 month ago| Platform:

Apply

Skills Required

data ai vision inference design reliability metrics monitoring drift analysis latency automation plugins collaboration devops engineering reporting efficiency opentelemetry aws gcp azure kubernetes python scripting model mobility transparency leadership learning

Work Mode

On-site

Job Type

Full Time

Job Description

About Xenonstack

XenonStack is the fastest-growing

Data and AI Foundry for Agentic Systems

, enabling enterprises to gain

real-time and intelligent business insights

We Deliver Innovation Through

Agentic Systems for AI Agents → akira.ai
Vision AI Platform → xenonstack.ai
Inference AI Infrastructure for Agentic Systems → nexastack.ai

Our mission is to accelerate the world’s transition to

AI + Human Intelligence

by building platforms that are

scalable, reliable, and observable by design

THE OPPORTUNITY

We are seeking an

Agentic Infrastructure Observability Engineer

to design and implement

end-to-end observability frameworks

for AI-native and multi-agent systems.This role sits at the heart of

AgentOps and Reliability Engineering

— ensuring that

agents, pipelines, and infrastructure

are monitored, measurable, and continuously optimized.If you thrive on

metrics, monitoring, and making complex systems transparent and reliable

, this role offers a chance to define observability for the next generation of enterprise AI.

Key Responsibilities

Observability Frameworks

Design and implement observability pipelines covering metrics, logs, traces, and cost telemetry for agentic systems.
Build dashboards and alerting systems to monitor reliability, performance, and drift in real-time.

Agentic AI Monitoring

Track LLM usage, context windows, token allocation, and multi-agent interactions.
Build monitoring hooks into LangChain, LangGraph, MCP, and RAG pipelines.

Reliability & Performance

Define and monitor SLOs, SLIs, and SLAs for agentic workflows and inference infrastructure.
Conduct root cause analysis of agent failures, latency issues, and cost spikes.

Automation & Tooling

Integrate observability into CI/CD and AgentOps pipelines.
Develop custom plugins/scripts to extend observability for LLMs, agents, and data pipelines.

Collaboration & Reporting

Work with AgentOps, DevOps, and Data Engineering teams to ensure system-wide observability.
Provide executive-level reporting on reliability, efficiency, and adoption metrics.

Continuous Improvement

Implement feedback loops to improve agent performance and reduce downtime.
Stay updated with state-of-the-art observability and AI monitoring frameworks.

Skills & Qualifications

Must-Have

3–6 years of experience in SRE, DevOps, or Observability Engineering.
Strong knowledge of observability tools (Prometheus, Grafana, ELK, OpenTelemetry, Jaeger).
Experience with cloud-native infrastructure (AWS, GCP, Azure) and Kubernetes monitoring.
Proficiency in Python, Go, or Bash for scripting and automation.
Understanding of AI/LLM pipelines, RAG systems, and vector databases.
Hands-on with CI/CD pipelines and monitoring-as-code.

Good-to-Have

Experience with AgentOps tools (LangSmith, PromptLayer, Arize AI, Weights & Biases).
Exposure to AI-specific observability (token usage, model latency, hallucination tracking).
Knowledge of Responsible AI monitoring frameworks.
Background in BFSI, GRC, SOC, or other regulated industries.

WHY SHOULD YOU JOIN US?

Agentic AI Product Company

Build observability frameworks for

next-gen enterprise AI systems

A Fast-Growing Category Leader

Be part of one of the fastest-growing

AI Foundries

, powering mission-critical agent deployments.

Career Mobility & Growth

Advance into roles like

Reliability Architect, AgentOps Lead, or Head of Observability

Global Exposure

Work on observability challenges across

Fortune 500 enterprises and global innovators

Create Real Impact

Ensure

transparency, trust, and resilience

in production-grade AI systems.

Culture of Excellence

Our values —

Agency, Taste, Ownership, Mastery, Impatience, and Customer Obsession

— give you autonomy to innovate and accountability to deliver.

Responsible AI First

Help enterprises adopt AI that is

not just powerful, but explainable and auditable

XENONSTACK CULTURE – JOIN US & MAKE AN IMPACT!

At XenonStack, we believe in

shaping the future of intelligent systems

. We foster a

culture of cultivation

built on bold, human-centric leadership principles, where

deep work, simplicity, and adoption

define everything we do.

Our Cultural Values

Agency – Be self-directed and proactive.
Taste – Sweat the details and build with precision.
Ownership – Take responsibility for outcomes.
Mastery – Commit to continuous learning and growth.
Impatience – Move fast and embrace progress.
Customer Obsession – Always put the customer first.

Our Product Philosophy

Obsessed with Adoption – Making observability and trust an integral part of enterprise AI.
Obsessed with Simplicity – Turning complex monitoring into seamless, actionable insights.

Be part of our mission to

accelerate the world’s transition to AI + Human Intelligence

— by making agentic AI systems