About Xenonstack
XenonStack is the fastest-growing
Data and AI Foundry for Agentic Systems
, enabling enterprises to gain
real-time and intelligent business insights
.
We Deliver Innovation Through
- Agentic Systems for AI Agents → akira.ai
- Vision AI Platform → xenonstack.ai
- Inference AI Infrastructure for Agentic Systems → nexastack.ai
Our mission is to accelerate the world’s transition to
AI + Human Intelligence
by building platforms that are
scalable, reliable, and observable by design
.
THE OPPORTUNITY
We are seeking an
Agentic Infrastructure Observability Engineer
to design and implement
end-to-end observability frameworks
for AI-native and multi-agent systems.This role sits at the heart of
AgentOps and Reliability Engineering
— ensuring that
agents, pipelines, and infrastructure
are monitored, measurable, and continuously optimized.If you thrive on
metrics, monitoring, and making complex systems transparent and reliable
, this role offers a chance to define observability for the next generation of enterprise AI.
Key Responsibilities
- Observability Frameworks
- Design and implement observability pipelines covering metrics, logs, traces, and cost telemetry for agentic systems.
- Build dashboards and alerting systems to monitor reliability, performance, and drift in real-time.
- Agentic AI Monitoring
- Track LLM usage, context windows, token allocation, and multi-agent interactions.
- Build monitoring hooks into LangChain, LangGraph, MCP, and RAG pipelines.
- Reliability & Performance
- Define and monitor SLOs, SLIs, and SLAs for agentic workflows and inference infrastructure.
- Conduct root cause analysis of agent failures, latency issues, and cost spikes.
- Automation & Tooling
- Integrate observability into CI/CD and AgentOps pipelines.
- Develop custom plugins/scripts to extend observability for LLMs, agents, and data pipelines.
- Collaboration & Reporting
- Work with AgentOps, DevOps, and Data Engineering teams to ensure system-wide observability.
- Provide executive-level reporting on reliability, efficiency, and adoption metrics.
- Continuous Improvement
- Implement feedback loops to improve agent performance and reduce downtime.
- Stay updated with state-of-the-art observability and AI monitoring frameworks.
Skills & Qualifications
Must-Have
- 3–6 years of experience in SRE, DevOps, or Observability Engineering.
- Strong knowledge of observability tools (Prometheus, Grafana, ELK, OpenTelemetry, Jaeger).
- Experience with cloud-native infrastructure (AWS, GCP, Azure) and Kubernetes monitoring.
- Proficiency in Python, Go, or Bash for scripting and automation.
- Understanding of AI/LLM pipelines, RAG systems, and vector databases.
- Hands-on with CI/CD pipelines and monitoring-as-code.
Good-to-Have
- Experience with AgentOps tools (LangSmith, PromptLayer, Arize AI, Weights & Biases).
- Exposure to AI-specific observability (token usage, model latency, hallucination tracking).
- Knowledge of Responsible AI monitoring frameworks.
- Background in BFSI, GRC, SOC, or other regulated industries.
WHY SHOULD YOU JOIN US?
- Agentic AI Product Company
Build observability frameworks for
next-gen enterprise AI systems
.
- A Fast-Growing Category Leader
Be part of one of the fastest-growing
AI Foundries
, powering mission-critical agent deployments.
Advance into roles like
Reliability Architect, AgentOps Lead, or Head of Observability
.
Work on observability challenges across
Fortune 500 enterprises and global innovators
.
Ensure
transparency, trust, and resilience
in production-grade AI systems.
Our values —
Agency, Taste, Ownership, Mastery, Impatience, and Customer Obsession
— give you autonomy to innovate and accountability to deliver.
Help enterprises adopt AI that is
not just powerful, but explainable and auditable
.
XENONSTACK CULTURE – JOIN US & MAKE AN IMPACT!
At XenonStack, we believe in
shaping the future of intelligent systems
. We foster a
culture of cultivation
built on bold, human-centric leadership principles, where
deep work, simplicity, and adoption
define everything we do.
Our Cultural Values
- Agency – Be self-directed and proactive.
- Taste – Sweat the details and build with precision.
- Ownership – Take responsibility for outcomes.
- Mastery – Commit to continuous learning and growth.
- Impatience – Move fast and embrace progress.
- Customer Obsession – Always put the customer first.
Our Product Philosophy
- Obsessed with Adoption – Making observability and trust an integral part of enterprise AI.
- Obsessed with Simplicity – Turning complex monitoring into seamless, actionable insights.
Be part of our mission to
accelerate the world’s transition to AI + Human Intelligence
— by making agentic AI systems
transparent, observable, and reliable at scale
.