Posted:2 weeks ago|
Platform:
On-site
Full Time
**Who you are**
You’ve stepped beyond traditional QA—you test AI agents, not just UI clicks. You build automated tests that check for **hallucinations, bias, adversarial inputs**, prompt chain integrity, model outputs, and multi-agent orchestration failures. You script Python tests and use Postman/Selenium/Playwright for UI/API, and JMeter or k6 for load. You understand vector databases and can test embedding correctness and data flows. You can ask, “What happens when two agents clash?” or “If one agent hijacks context, does the system fail?” and then write tests for these edge cases. You’re cloud-savvy—Azure or AWS—and integrate tests into CI/CD. You debug failures in agent-manager systems and help triage model logic vs infra issues. You take ownership of AI test quality end-to-end.
---
**What you’ll actually do**
You’ll design **component & end-to-end tests** for multi-agent GenAI workflows (e.g., planner + execution + reporting agents). You’ll script pytest + Postman + Playwright suites that test API functionality, failover logic, agent coordination, and prompt chaining. You’ll simulate coordination failures, misalignment, hallucinations in agent dialogues. You’ll run load tests on LLM endpoints, track latency and cost. You’ll validate that vector DB pipelines (Milvus/FAISS/Pinecone) return accurate embeddings and retrieval results. You’ll build CI/CD pipelines (Azure DevOps, GitHub Actions, Jenkins) that gate merges based on model quality thresholds. You’ll implement drift, bias, hallucination metrics, and create dashboards for QA monitoring. You’ll occasion a human-in-the-loop sanity check for critical agent behavior. You’ll write guides so others understand how to test GenAI pipelines.
---
**Skills and knowledge**
• Python automation—pytest/unittest for component & agent testing
• Postman/Newman, Selenium/Playwright/Cypress for UI/API test flows
• Load/performance tools—JMeter, k6 for inference endpoints
• SQL/NoSQL and data validation for vector DB pipelines
• Vector DB testing—Milvus, FAISS, Pinecone embeddings/retrieval accuracy
• GenAI evaluation—hallucinations, bias/fairness, embedding similarity (BLEU, ROUGE), adversarial/prompt injection testing
• Multi-agent testing—understand component/unit tests per agent, inter-agent communications, coordination failure tests, message passing or blackboard rhythm, emergent behavior monitoring
• CI/CD integration—Azure DevOps/GitHub Actions/Jenkins pipelines, gating on quality metrics
• Cloud awareness—testing in Azure/AWS/GCP, GenAI endpoints orchestration and failure mode testing
• Monitoring & observability—drift, latency, hallucination rate dashboards
• Soft traits—detail oriented, QA mindset, self-driven, cross-functional communicator, ethical awareness around AI failures.
Serenovolante Software Services Private Limited
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Practice Python coding challenges to boost your skills
Start Practicing Python NowExperience: Not specified
Salary: Not disclosed
Experience: Not specified
Salary: Not disclosed