Role Description
Job Title
QE Architect – AI / LLM Systems
Role SummaryWe are looking for a visionary
QE Architect – AI/LLM Systems
to architect, define, and drive the overall Quality Engineering (QE) strategy for next-generation AI products. This role will focus on building scalable quality architectures, AI evaluation frameworks, and automated testing pipelines to ensure reliable, safe, and high-quality AI-driven user experiences.The ideal candidate will bring strong thought leadership and deep technical expertise, working at the intersection of
AI/ML, LLM systems, software engineering, and quality governance
.Key Responsibilities
- QE Architecture & Strategy
- Define and own the end-to-end quality architecture for all AI and LLM initiatives across the organization.
- Design enterprise-level QE frameworks and reusable components for:
- Conversational AI applications and chatbots
- Knowledge-management bots and RAG systems
- Semantic and vector-based text search
- Image search and multimodal AI systems
- Generative AI platforms
- Establish scalable testing pipelines for model evaluation, data validation, and automation.
- AI / LLM Evaluation Frameworks
- Architect comprehensive evaluation systems for:
- Prompt testing and scenario-based validation
- LLM output quality, safety, bias, and consistency
- Hallucination detection and mitigation
- RAG correctness, grounding accuracy, and knowledge integrity
- Search relevance and ranking metrics
- Build automated scorecards and continuous evaluation dashboards.
- Automation & Infrastructure
- Design and implement automation frameworks for:
- LLM APIs and chat agents
- Multimodal AI pipelines
- Vector databases and semantic search services
- Architect model regression detection using:
- Golden datasets
- Synthetic test data generation
- LLM-as-a-Judge approaches
- Self-evaluation and multi-agent evaluation techniques
- Integrate AI test harnesses into CI/CD and LLMOps pipelines.
- Data Quality & Test Data Strategy
- Define enterprise-wide AI test data management strategies, including:
- Ground-truth datasets
- Benchmark datasets
- Adversarial and edge-case inputs
- Safety and compliance-focused test scenarios
- Architecture Reviews & Cross-Team Leadership
- Provide architectural guidance to ML engineers, data engineers, and software teams on testability and observability.
- Review AI system architectures, including model pipelines, chatflows, orchestration layers, and search systems.
- Drive quality gates across experimentation, pre-production, and production rollout cycles.
- Quality Governance & Best Practices
- Establish enterprise standards for:
- AI testing taxonomies and methodologies
- Privacy, safety, and compliance validation
- Defect classification for LLM-specific issues
- Reliability, latency, and scalability benchmarks
- Lead adoption of AI/ML Quality Engineering best practices across teams.
Required Qualifications
- 10+ years of experience in Quality Engineering, with at least 3+ years in AI/ML/LLM systems.
- Strong understanding of:
- Large Language Models (LLMs), NLP, embeddings, and vector databases
- Chatbot platforms such as Dialogflow, Rasa, Botpress, Amazon Lex, etc.
- RAG pipelines and knowledge-management systems
- Image search and multimodal AI architectures
- Strong programming experience in Python, Java, or TypeScript, with ML/NLP libraries.
- Proven experience building CI/CD-integrated AI test automation frameworks.
- Hands-on knowledge of AI evaluation metrics such as:
- Perplexity, factuality, grounding score
- CER/WER, BLEU, ROUGE
- MRR, NDCG, search relevance metrics
- Model drift and performance stability metrics
- Experience handling non-deterministic testing, probabilistic evaluation, and AI quality challenges.
- Proven ability to architect and scale enterprise-grade QE systems.
Core Skills
AI / ML / LLM Systems
- QE Architecture
- Python / Java / TypeScript
- CI/CD & LLMOps
- Automation Frameworks
- RAG & Vector Search
- AI Quality Metrics
Skills
Automation Testing, AI/ML/LLM system,python, Java or TypeScript,CI/CD, QE Architecture