Posted:1 day ago|
Platform:
On-site
Part Time
Date Opened
Job Type
Industry
Work Experience
City
State/Province
Country
Zip/Postal Code
XenonStack is the fastest-growing data and AI foundry for agentic systems, enabling people and organizations to gain real-time and intelligent business insights.
Agentic Systems for AI Agents: akira.ai
Vision AI Platform: xenonstack.ai
Inference AI Infrastructure for Agentic Systems: nexastack.ai
We are seeking an Agentic Infrastructure Observability Engineer to design, implement, and maintain visibility, monitoring, and assurance systems for large-scale AI agent deployments.
This role focuses on observability, telemetry, and evaluation pipelines across multi-agent and multi-context workflows, ensuring AI systems are measurable, trustworthy, and compliant in enterprise and regulated environments.
If you’re passionate about SRE principles for AI, LLM evaluation, and agentic system transparency, this role offers the chance to shape observability for the next generation of intelligent automation.
Must-Have:
3–5 years in SRE, DevOps, AI infrastructure, or ML systems engineering.
Proficiency in Python and observability stacks (Prometheus, OpenTelemetry, Grafana, ELK, etc.).
Familiarity with LLM architectures, multi-agent orchestration frameworks (LangGraph, LangChain, AgentBridge), and context pipelines.
Experience with logging, tracing, and performance profiling for distributed systems.
Understanding of LLM evaluation metrics (factuality, coherence, toxicity, cost efficiency).
Knowledge of privacy and compliance standards for AI systems.
Good-to-Have:
Hands-on experience with LLM eval tools (TruLens, Ragas, Arize AI, Weights & Biases).
Familiarity with RAG, vector databases, and knowledge graph-based retrieval.
Experience in regulated industries (BFSI, healthcare, GRC).
Background in anomaly detection or behavioral monitoring for ML systems.
Continuous Learning & Growth
Training and certifications in AI observability, LLM evaluation, and Responsible AI.
Hands-on exposure to enterprise-scale agentic infrastructure.
Recognition & Rewards
Incentives for innovations in AI observability and monitoring.
Fast-track opportunities into AI Reliability Architecture or Model Ops Leadership roles.
Work Benefits & Well-Being
Comprehensive medical insurance and project-based allowances.
Cab facilities for women employees and special project perks.
We foster a culture of cultivation with bold, human-centric leadership principles. We value deep work, experimentation, and ownership in every initiative, and we are on a mission to reshape how enterprises adopt AI + Human Intelligence systems.
Product Values:
Obsessed with Adoption – Making AI accessible and enterprise-ready.
Obsessed with Simplicity – Turning complexity into seamless, intuitive AI experiences.
Be a part of our vision to accelerate the world’s transition to AI + Human Intelligence.
XenonStack
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Practice Python coding challenges to boost your skills
Start Practicing Python Now2.4 - 7.2 Lacs P.A.
2.4 - 7.2 Lacs P.A.