3 - 6 years

20 - 35 Lacs

hyderabad bengaluru mumbai (all areas)

Posted:3 days ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

We are seeking an experienced Data Engineer (AI/ML Ops) to design, build, and optimize data pipelines, orchestration frameworks, and data infrastructure supporting large-scale AI/ML programs. This role is pivotal in ensuring seamless data flow, from acquisition and preparation to annotation, validation, and delivery enabling AI/ML models to train on clean, well-structured, and compliant datasets at scale. This role will collaborate with Data Scientists, AI Engineers, IT Operations and Product Teams, working across cloud environments (i.e., AWS, GCP, Azure) and integrating with internal and third-party tools to deliver end-to-end data solutions. The ideal candidate brings a blend of technical depth, operational mindset, and an understanding of AI/ML data lifecycle requirements.

Role & responsibilities:

Data Pipeline Development & Infrastructure

  • Design, build, and maintain high-performance, scalable ETL/ELT pipelines capable of handling multi-terabyte, multi-modal datasets.
  • Develop data ingestion frameworks to support structured, semi-structured, and unstructured data (text, audio, video, images).
  • Implement real-time streaming data solutions where applicable (e.g., for ML models requiring continuous data feeds).

Data Collection & Acquisition

  • Develop and manage automated data collection frameworks (web scraping, API integrations, SDKs, crowdsourced collection pipelines).
  • Ensure data provenance, integrity, and compliance with GDPR, CCPA, and other regulatory standards.
  • Partner with external data vendors and internal gig-platforms to source high-quality, domain-specific datasets.

Orchestration Frameworks & Data Labeling Workflows

  • Build orchestration layers to manage large-scale data labeling/annotation projects, integrating with 3rd party platforms, or internal tools.
  • Automate workflow management, task routing, and quality assurance loops to optimize throughput and accuracy.
  • Implement analytics dashboards to monitor labeling productivity, SLAs, and dataset health.

Data Quality, Validation & Governance

  • Create data validation pipelines with anomaly detection, deduplication, and bias checks to ensure high-quality datasets.
  • Establish governance frameworks to manage dataset lineage, versioning, and traceability, ensuring readiness for AI regulatory compliance (EU AI Act, NIST AI RMF).

Collaboration & Cross-Functional Enablement

  • Partner with Data Scientists and ML Engineers to understand data requirements for model training, fine-tuning, and evaluation.
  • Collaborate with Operations, Product Managers, and Client Teams to translate business requirements into technical solutions.
  • Contribute to playbooks, best practices, and internal knowledge bases to scale capabilities across TP.ai.

Automation & Continuous Improvement

  • Leverage workflow orchestration tools and CI/CD practices to automate and improve operational efficiency.
  • Optimize pipelines for cost efficiency, scalability, and latency in cloud environments.
  • Evaluate and implement emerging tools and technologies in data engineering and AI data operations.

Preferred candidate profile

Education & Experience

  • Bachelors/Masters in Computer Science, Data Engineering, or a related field.
  • 5+ years of experience as a Data Engineer, preferably in AI/ML Ops environments.
  • Proven experience with large-scale data pipelines, data orchestration, and cloud-based data systems.

Technical Skills

  • Programming: Python (Pandas, PySpark), SQL, and one additional programming language (Java/Scala preferred).
  • Data Processing: Spark, Beam, Kafka, Flink.
  • Cloud Platforms: AWS (S3, Glue, Redshift), GCP (BigQuery, Dataflow), Azure Data Lake.
  • Workflow Orchestration: Airflow, Prefect, Luigi, Dagster.
  • : Git, Jenkins, GitHub Actions.
  • Data Annotation & Crowdsourcing Platforms experience

Preferred Skills

  • Exposure to ML lifecycle management tools (MLflow, Kubeflow).
  • Familiarity with Responsible AI practices (bias detection, dataset auditing).
  • Experience working with distributed teams and gig-based workforce enablement.

Why Join TP?

At TP.ai, you wont just be building pipelines; youll be shaping the backbone of next-generation AI systems used by some of the worlds most innovative companies. As part of a global, fast-scaling team, youll work on multi-modal datasets at massive scale (i.e., text, speech, video, sensors) and solve problems that directly impact the future of Generative AI, Computer Vision, Conversational AI, and AI Safety.

TP offers

  • Opportunities to experiment with emerging tools (MLflow, Kubeflow, Dagster, etc.) and push the boundaries of AI/ML Ops.
  • A collaborative, international environment where you can learn, grow, and lead in the evolving AI data services industry.
  • A mission-driven culture focused on Responsible AI and global impact.

Mock Interview

Practice Video Interview with JobPe AI

Start PySpark Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
Teleperformance (TP) logo
Teleperformance (TP)

Business Process Outsourcing (BPO)

Paris

RecommendedJobs for You

thiruvananthapuram, kerala

bengaluru, karnataka, india