Senior Data Engineer - Data Processing & Feature Engineering

6 years

0 Lacs

Posted:1 day ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Senior Data Engineer - Data Processing & Feature Engineering 


Location: Coimbatore

Experience Level: 6+ years


About the Role

We are seeking exceptional Senior Data Engineers to build the data foundation powering Velogent AI's autonomous agents. You will design and implement large-scale data ingestion, processing, and feature engineering systems that transform unstructured enterprise data (invoices, documents, transactions, RFQs) into structured, high-quality datasets. Your work enables agentic AI systems to make accurate, compliance-aware decisions while maintaining data quality, lineage, and auditability standards required by regulated industries.


Core Responsibilities

  • Design and architect end-to-end data pipelines processing large volumes of unstructured enterprise data (documents, PDFs, transaction records, email, etc.)
  • Build sophisticated data ingestion frameworks supporting multiple data sources and formats with automated validation and quality checks
  • Implement large-scale data processing solutions using distributed computing frameworks handling terabytes of data efficiently
  • Develop advanced feature engineering pipelines extracting meaningful signals from unstructured data (document classification, entity extraction, semantic tagging)
  • Design data warehousing architecture supporting both operational (near real-time) and analytical queries for agentic AI reasoning
  • Build robust data quality frameworks ensuring high data accuracy critical for agent decision-making and regulatory compliance
  • Implement data governance patterns including lineage tracking, metadata management, and audit trails for regulated environments
  • Optimize data pipeline performance, reliability, and cost through intelligent partitioning, caching, and resource optimization
  • Lead data security implementation protecting sensitive information (PII, financial data, healthcare records) with encryption and access controls
  • Collaborate with AI engineers to understand data requirements and optimize data for model training and inference
  • Establish best practices for data documentation, SLA management, and operational excellence


Must-Have Qualifications

  • Unstructured Data Expertise: Production experience ingesting and processing large volumes of unstructured data (documents, PDFs, images, text, logs)
  • Large-Scale Data Processing: Advanced expertise with distributed data processing frameworks (Apache Spark, Flink, or cloud-native alternatives like AWS Glue)
  • Feature Engineering: Deep knowledge of advanced feature engineering techniques for ML systems, including automated feature extraction and transformation
  • Python Proficiency: Expert-level Python for data processing, ETL pipeline development, and data science workflows
  • NLP/Text Processing: Strong background in NLP and text analysis techniques for document understanding, entity extraction, and semantic processing
  • Data Architecture: Experience designing data warehouses, data lakes, or lakehouse architectures supporting both batch and real-time processing
  • ETL/ELT Pipeline Design: Proven expertise building production-grade ETL/ELT pipelines with error handling, retry logic, and monitoring
  • Cloud Data Platforms: Advanced experience with AWS data services (S3, Athena, Glue, RDS, DynamoDB) or equivalent cloud platforms
  • Data Quality & Governance: Understanding of data quality frameworks, metadata management, and data governance practices


Nice-to-Have Qualifications

  • Experience with document parsing and layout analysis libraries (Pydantic, unstructured.io, PyPDF, etc.)
  • Knowledge of information extraction pipelines and vector databases for semantic search
  • Familiarity with Apache Kafka or other event streaming platforms for real-time data processing
  • Experience with dbt (data build tool) or similar data transformation frameworks
  • Understanding of data privacy and compliance frameworks (GDPR, HIPAA, SOC2)
  • Experience optimizing costs in cloud data platforms through intelligent resource allocation
  • Background in building recommendation systems or ranking systems using feature engineering
  • Knowledge of graph databases and knowledge graphs for relationship extraction
  • Familiarity with computer vision techniques for document analysis and processing
  • Published work or open-source contributions in NLP, document processing, or data engineering


What You'll Work With

  • Large-scale document processing pipelines handling millions of invoices, contracts, and business documents
  • Apache Spark and distributed computing frameworks for ETL
  • AWS data services (S3, Glue, Athena, RDS) for data infrastructure
  • Advanced NLP and text processing libraries (spaCy, transformers, LangChain)
  • Vector databases and semantic search infrastructureData quality and monitoring frameworks
  • Cloud data warehouses and data lakes on AWS
  • Compliance and governance frameworks for regulated industries


Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You