About Xenonstack
XenonStack is the fastest-growing
Data and AI Foundry for Agentic Systems
, enabling enterprises to gain
real-time and intelligent business insights
.
We Deliver Innovation Through
- Agentic Systems for AI Agents ? akira.ai
- Vision AI Platform ? xenonstack.ai
- Inference AI Infrastructure for Agentic Systems ? nexastack.ai
Our mission is to accelerate the world's transition to
AI + Human Intelligence
by building platforms that are
scalable, reliable, and observable by design
.
THE OPPORTUNITY
We are seeking an
Agentic Infrastructure Observability Engineer
to design and implement
end-to-end observability frameworks
for AI-native and multi-agent systems.This role sits at the heart of
AgentOps and Reliability Engineering
ensuring that
agents, pipelines, and infrastructure
are monitored, measurable, and continuously optimized.If you thrive on
metrics, monitoring, and making complex systems transparent and reliable
, this role offers a chance to define observability for the next generation of enterprise AI.
Key Responsibilities
- Observability Frameworks
- Design and implement observability pipelines covering metrics, logs, traces, and cost telemetry for agentic systems.
- Build dashboards and alerting systems to monitor reliability, performance, and drift in real-time.
- Agentic AI Monitoring
- Track LLM usage, context windows, token allocation, and multi-agent interactions.
- Build monitoring hooks into LangChain, LangGraph, MCP, and RAG pipelines.
- Reliability & Performance
- Define and monitor SLOs, SLIs, and SLAs for agentic workflows and inference infrastructure.
- Conduct root cause analysis of agent failures, latency issues, and cost spikes.
- Automation & Tooling
- Integrate observability into CI/CD and AgentOps pipelines.
- Develop custom plugins/scripts to extend observability for LLMs, agents, and data pipelines.
- Collaboration & Reporting
- Work with AgentOps, DevOps, and Data Engineering teams to ensure system-wide observability.
- Provide executive-level reporting on reliability, efficiency, and adoption metrics.
- Continuous Improvement
- Implement feedback loops to improve agent performance and reduce downtime.
- Stay updated with state-of-the-art observability and AI monitoring frameworks.
Skills & Qualifications
Must-Have
- 36 years of experience in SRE, DevOps, or Observability Engineering.
- Strong knowledge of observability tools (Prometheus, Grafana, ELK, OpenTelemetry, Jaeger).
- Experience with cloud-native infrastructure (AWS, GCP, Azure) and Kubernetes monitoring.
- Proficiency in Python, Go, or Bash for scripting and automation.
- Understanding of AI/LLM pipelines, RAG systems, and vector databases.
- Hands-on with CI/CD pipelines and monitoring-as-code.
Good-to-Have
- Experience with AgentOps tools (LangSmith, PromptLayer, Arize AI, Weights & Biases).
- Exposure to AI-specific observability (token usage, model latency, hallucination tracking).
- Knowledge of Responsible AI monitoring frameworks.
- Background in BFSI, GRC, SOC, or other regulated industries.
WHY SHOULD YOU JOIN US
- Agentic AI Product Company
Build observability frameworks for
next-gen enterprise AI systems
.
- A Fast-Growing Category Leader
Be part of one of the fastest-growing
AI Foundries
, powering mission-critical agent deployments.
Advance into roles like
Reliability Architect, AgentOps Lead, or Head of Observability
.
Work on observability challenges across
Fortune 500 enterprises and global innovators
.
Ensure
transparency, trust, and resilience
in production-grade AI systems.
Our values
Agency, Taste, Ownership, Mastery, Impatience, and Customer Obsession
give you autonomy to innovate and accountability to deliver.
Help enterprises adopt AI that is
not just powerful, but explainable and auditable
.
XENONSTACK CULTURE JOIN US & MAKE AN IMPACT!
At XenonStack, we believe in
shaping the future of intelligent systems
. We foster a
culture of cultivation
built on bold, human-centric leadership principles, where
deep work, simplicity, and adoption
define everything we do.
Our Cultural Values
- Agency Be self-directed and proactive.
- Taste Sweat the details and build with precision.
- Ownership Take responsibility for outcomes.
- Mastery Commit to continuous learning and growth.
- Impatience Move fast and embrace progress.
- Customer Obsession Always put the customer first.
Our Product Philosophy
- Obsessed with Adoption Making observability and trust an integral part of enterprise AI.
- Obsessed with Simplicity Turning complex monitoring into seamless, actionable insights.
Be part of our mission to
accelerate the world's transition to AI + Human Intelligence
by making agentic AI systems
transparent, observable, and reliable at scale
.