Required Skills:
Python (FastAPI), LangChain, LlamaIndex, OpenAI API, Anthropic
Additional Skills / Good to have:
Postgres (Pgvector), Supabase, Redis.,Docker, LangSmith, GitHub Actions
we're looking for a passionate AI Engineer to build and scale production-grade Al systems powering our platform. you'll work on real-world Al challenges including document intelligence, real-time voice AI. and intelligent job-to-candidate matching systems.
Role & responsibilities:
-
Build Agentic Workflows:
Design and deploy AI agents capable of reasoning and decision-making using frameworks like LangChain or LlamaIndex. -
Architect Voice AI Systems:
Working on low-latency, real-time conversational bots for candidate outreach using WebSockets, STT, TTS, ensuring natural state management and context retention. -
Engineer Robust Data Pipelines:
Build parsing modules that force LLMs to return strict JSON schemas for resume data extraction and implement cleaning pipelines for unstructured data (PDF/DOCX). -
Implement Advanced RAG:
Develop retrieval systems using Pgvector or Pinecone that utilize Hybrid Search (semantic + keyword) to ensure accurate job-to-candidate matching. -
Productionize & Observe:
Set up tracing and observability using tools like LangSmith to debug complex chains, monitor token usage, and optimize costs. -
Backend Integration:
Wrap AI logic into scalable, asynchronous microservices using Python (FastAPI) and containerize them with Docker.
Technical Requirements:
1. Generative AI & LLM Engineering
-
Structured Outputs:
Proven experience forcing LLMs to output valid JSON schemas via function calling (essential for data parsing tasks). -
Prompt Engineering:
Deep understanding of prompting strategies (Chain-of-Thought, Few-Shot) and ability to design robust system prompts that handle edge cases gracefully. -
Orchestration:
Hands-on experience building complex chains and retrieval loops using LangChain or LlamaIndex.
2. Voice AI & Real-Time Processing
- Audio Stack: Experience with STT/TTS APIs (Whisper, Deepgram).
- Streaming & Latency: Mastery of WebSockets and asynchronous programming to handle streaming audio with sub-second latency.
- State Management: Ability to architect conversation managers that maintain "memory" history, and prior answers during a live call.
3. Search & Data (RAG)
- Vector Databases: Proficiency with vector stores like Pgvector (Supabase), Pinecone, or Qdrant.
- Ingestion: Experience constructing pipelines for chunking and cleaning unstructured documents.
4. MLOps & Production Engineering
- Observability: Experience tracking traces, latency, and errors using LangSmith.
- Evals: Ability to write automated evaluation scripts ("unit tests for AI") to verify prompt performance against datasets before deployment.
- Cost Optimization: Experience monitoring token consumption and implementing strategies to balance intelligence vs. cost (eg, routing simpler tasks to smaller models)
.
5. Core Backend
- Python:Python skills, specifically with FastAPI.
- Async/Concurrency: Mastery of async/await patterns to handle concurrent resume parsing and multiple voice calls simultaneously.
- Infrastructure: Proficiency with Docker and basic SQL for relational data management.
6. Machine Learning & Algorithms ( Good to have )
- Recommendation Logic: Understanding of core matching concepts beyond just embeddings (eg, Collaborative Filtering, Matrix Factorization, or Two-Tower Architecture).
- Ranking & Scoring: Experience implementing Learning to Rank (LTR) or Re-ranking strategies (Cross-Encoders) to sort thousands of candidates accurately.
- Predictive Modeling: Familiarity with traditional ML libraries (Scikit-learn, XGBoost) to build classification models (eg, "Predicting candidate joining probability").
The "Applied Mindset" We Need:
- Model Strategist: You know when to use GPT-4o and when to use a cheaper, faster model like GPT-4o-mini or a local Llama instance. In order to balance cost, latency, and intelligence effectively.
- Security First: You understand the risks of Prompt Injection and Jailbreaking, especially in public-facing interview bots, and you know how to mitigate them.
- Hallucination Mitigation: you'dont trust the model blindly. You use grounding techniques to ensure the AI sticks to the provided facts.
Tech Stack Overview
- Languages: Python (FastAPI).
- AI/Orchestration: LangChain, LlamaIndex, OpenAI API, Anthropic
- Voice: Deepgram, Twilio (optional but a plus).
- Database: Postgres (Pgvector), Supabase, Redis.
- Ops: Docker, LangSmith, GitHub Actions.