Who you are You're someone who deeply understands how modern LLMs and GenAI systems actually work and what it takes to bring them to life in production. You’ve probably built your own internal chatbot, a smart assistant, or a multi-agent workflow just to try something out. You follow new research not just out of curiosity, but to build. You’re comfortable reading a paper on AgentScope or RAG best practices on a weekend and prototyping it on Monday. You don’t wait for instructions — you take full ownership of ideas and ship fast. You understand what it means to work in a startup — incomplete specs, ambiguity, no predefined playbooks. You know how to write clean Python code, deploy with Docker, debug model outputs, and get a microservice up in a day. You enjoy mentoring juniors, being self-critical of your prompts, and building end-to-end systems that someone will actually use. You like learning new tools, but care more about building things that work well than following tool hype. --- What you will actually do You will design and ship multi-agent GenAI systems for internal users, such as smart chatbots, assistant systems, and intelligent document processors. You will build complete RAG pipelines from ingestion to embedding, retrieval, and hallucination detection. You will write advanced prompts, run evaluations, and fine-tune models using parameter-efficient techniques like LoRA or QLoRA when needed. You will build REST APIs using FastAPI or Django to expose AI models as usable tools across the org. You’ll containerize and deploy models or services on Azure using Kubernetes, and wire in monitoring, cost tracking, and performance analytics. You’ll work with vector databases like Milvus or Pinecone to store and retrieve embeddings. You will support semantic logging, debugging of long conversation chains, and keep latency within usable limits. You’ll also help shape the internal coding and testing practices around GenAI development and assist juniors in getting their components production-ready. You’ll explore papers and ideas regularly, pitch improvements, and rapidly test new models or tools in the stack. --- Skills and knowledge Strong in Python, especially building backend APIs with FastAPI or Django Deep knowledge of GenAI and LLM workflows, including RAG, prompt chaining, prompt tuning, memory tools, and multi-agent systems Experience with frameworks like LangChain, LangGraph, CrewAI, OpenAI SDK, Google Agents SDK, Pydantic AI Comfortable with cloud infrastructure, especially Azure OpenAI, Azure ML, and Azure DevOps; familiar with Docker and Kubernetes Familiar with LLM fine-tuning techniques like LoRA, QLoRA, PEFT Good hands-on understanding of vector databases such as Milvus, Pinecone, FAISS, and relational databases (PostgreSQL or MySQL) Understands how to implement observability for GenAI systems: prompt-level logs, hallucination tracking, cost and latency monitoring Able to build simple UI/UX interfaces for internal demos if needed or integrate with frontend teams effectively Strong research mindset: reads arXiv papers, tries out cutting-edge tools, evaluates new agent frameworks and tools practically Works in a self-driven way — can operate with low management and deliver things independently Willing to mentor, document code, write quick internal guides for tools, and present findings in the team regularly.