Job Description
We are seeking a Senior Data Engineer, AI/ML with deep expertise in knowledge base construction, retrieval-augmented reasoning (RAQ/RAG), and Generative AI data pipelines to help enable Assent s R&D toward Agentic AI systems.
In this role, you will design, build, and maintain intelligent data infrastructures that supply context, memory, and reasoning capabilities to autonomous AI agents. Your work will connect structured and unstructured enterprise data into continuously updated knowledge graphs and vectorized stores that empower dynamic retrieval, planning, and decision-making.
You will collaborate with AI/ML engineers, data scientists, and product teams to create scalable, auditable, and high-fidelity data pipelines that feed both assistive and autonomous AI functions. This position is ideal for someone who thrives at the intersection of data engineering, AI architecture, and knowledge representation.
Key Requirements & Responsibilities
-
Design, build, and optimize data pipelines for Agentic and Generative AI systems, enabling context retrieval, multi-step reasoning, and adaptive knowledge updates.
-
Develop and manage knowledge bases, vector stores, and graph databases to organize and retrieve information across diverse regulatory, product, and supplier domains.
-
Engineer retrieval-augmented reasoning (RAQ/RAG) pipelines, integrating embedding generation, contextual chunking, and retrieval orchestration for LLM-driven agents.
-
Collaborate cross-functionally with AI/ML, MLOps, Data, and Product teams to define data ingestion, transformation, and retrieval strategies aligned with evolving AI agent capabilities.
-
Implement and automate workflows for ingestion of structured and unstructured content (documents, emails, APIs, metadata) into searchable, continuously enriched data stores.
-
Design feedback and reinforcement loops that allow AI agents to validate, correct, and refine their knowledge sources over time.
-
Ensure data quality, consistency, and traceability through schema validation, metadata tagging, and lineage tracking within knowledge and vector systems.
-
Integrate monitoring and observability to measure retrieval performance, coverage, and model-data alignment for deployed agents.
-
Collaborate with data governance and security teams to enforce compliance, access control, and Responsible AI data handling standards.
-
Document schemas, pipelines, and data models to ensure reproducibility, knowledge sharing, and long-term maintainability.
-
Stay at the forefront of AI data innovation, evaluating new technologies in graph reasoning, embedding architectures, autonomous data agents, and memory frameworks.
-
Be familiar with corporate security policies and follow the guidance set out by processes and procedures of Assent.Qualifications
We strongly value your talent, energy and passion. It will also be valuable to Assent if you have the following qualifications
-
8+ years of experience in data engineering or applied AI infrastructure, with hands-on expertise in knowledge-centric or agentic AI systems.
-
Proven experience building retrieval-augmented generation (RAG) and retrieval-augmented reasoning/querying (RAQ) data pipelines.
-
Strong proficiency in Python and SQL, with experience designing large-scale data processing and orchestration workflows (Airflow, Prefect, Step Functions, or similar).
-
Deep familiarity with vector databases (e.g., Weaviate, Pinecone, FAISS, Elastic Vector Search, Milvus) and graph databases (e.g., Neo4j, AWS Neptune, ArangoDB).
-
Hands-on experience with embedding generation, semantic indexing, and context chunking for LLM retrieval and reasoning.
-
Experience with agentic AI protocols and orchestration frameworks such as Model Context Protocol (MCP), LangChain Agents, Semantic Kernel, or DSPy, LlamaIndex Agents, or custom orchestration layers enabling seamless interaction between models, tools, and enterprise data sources.
-
Knowledge of cloud data platforms (AWS preferred: S3, Glue, Lambda, ECS, Athena, Redshift) and infrastructure-as-code tools.
-
Knowledge of data modeling, schema design, and indexing strategies for both relational and NoSQL systems.
-
Understanding of LLM data workflows, including prompt evaluation, retrieval contexts, and fine-tuning data preparation.