Lead Assistant Manager

5 years

0 Lacs

Posted:17 hours ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Job Summary

We are seeking a detail-oriented and highly skilled

Data Annotator

to support the development of AI and Machine Learning (ML) models by preparing, labeling, and curating large-scale datasets. The ideal candidate will possess a strong understanding of annotation techniques, quality assurance for labeled data, and practical exposure to

cloud-based tools (with a strong emphasis on AWS SageMaker Ground Truth, GCP Data Labeling, and Azure ML Data Labeling)

. This role is pivotal in ensuring the integrity, scalability, and accuracy of the data pipelines that power advanced AI systems.The Data Annotator will collaborate closely with Data Scientists, Machine Learning Engineers, Cloud Architects, and Product Teams to deliver high-quality labeled datasets optimized for supervised learning, natural language processing (NLP), computer vision, and speech recognition models.

Key Responsibilities

Data Annotation & Labeling

  • Perform manual and semi-automated labeling of datasets across multiple modalities including text, audio, images, and video.
  • Create high-quality annotations for:
    • Text/NLP: Named Entity Recognition (NER), sentiment analysis, intent classification, part-of-speech tagging, conversation structuring, and chatbot training datasets.
    • Computer Vision: Bounding boxes, polygons, segmentation masks, key points, object tracking in videos, and OCR annotation.
    • Speech/Audio: Transcription, speaker diarization, phoneme tagging, emotion labeling, and acoustic event detection.
  • Conduct multi-tier annotation validation and apply inter-annotator agreement processes to ensure labeling accuracy.

AWS & Cloud-Based Annotation

  • Leverage AWS SageMaker Ground Truth for scalable data labeling workflows including automated data labeling with active learning.
  • Implement quality control (QC) mechanisms in SageMaker Ground Truth such as audit labels, annotation consolidation, and annotation jobs monitoring.
  • Integrate annotated datasets into AWS S3, ensuring optimal storage structures and lifecycle policies.
  • Work with AWS Glue, Athena, and QuickSight for dataset validation, analysis, and reporting.
  • Exposure to GCP Data Labeling Services and Azure ML Data Labeling tools for multi-cloud environments (good to have).
  • Collaborate with Cloud Engineers to automate annotation workflows using Lambda functions, Step Functions, and event-driven pipelines.

Data Management & Quality Assurance

  • Perform data preprocessing: cleaning, normalization, anonymization (especially for PII data), and augmentation.
  • Apply data quality checks to maintain dataset balance, reduce bias, and enhance representativeness.
  • Document annotation guidelines, taxonomy structures, and ontology mapping for consistent labeling practices.
  • Ensure compliance with security and privacy standards (GDPR, HIPAA, SOC2, ISO 27001) while working with sensitive datasets.

Collaboration & Continuous Improvement

  • Collaborate with ML Engineers and Data Scientists to refine annotation requirements based on evolving model performance.
  • Participate in regular feedback loops with AI developers to improve annotation accuracy and dataset utility.
  • Contribute to the design of annotation ontologies and label taxonomies for domain-specific projects (e.g., healthcare, finance, retail, manufacturing).
  • Stay updated on emerging annotation tools, AI-assisted labeling platforms, and best practices.

Core Skills

Required Skills & Competencies

  • Proven expertise in data annotation for AI/ML applications across text, image, and speech datasets.
  • Strong proficiency with AWS Cloud services, especially SageMaker Ground Truth, S3, and Glue.
  • Familiarity with annotation platforms and tools (Labelbox, Supervisely, CVAT, Prodigy, Doccano).
  • Knowledge of Python/SQL scripting for dataset preparation and automation.
  • Basic understanding of machine learning concepts (classification, object detection, NLP pipelines).
  • Familiarity with big data tools (Apache Spark, Databricks – nice to have).

Domain Knowledge

  • Text/NLP: Language models, chatbot training, intent recognition.
  • Computer Vision: Object detection, OCR, autonomous systems labeling.
  • Audio/Speech: Transcription guidelines, phoneme labeling, acoustic datasets.
  • Understanding of industry datasets (healthcare records, retail data, insurance documents, call center logs).

Cloud Expertise

  • AWS (Priority): SageMaker Ground Truth, S3, Glue, Athena, QuickSight, IAM for role-based access control.
  • GCP (Good to Have): Vertex AI, AutoML, Data Labeling.
  • Azure (Good to Have): Azure ML Data Labeling, Azure Blob Storage, Azure Cognitive Services.

Qualifications

  • Bachelor’s degree in Computer Science, Data Science, Information Technology, or related field.
  • 2–5 years of experience in data annotation, data labeling, or dataset preparation for AI/ML projects.
  • Hands-on experience with AWS annotation workflows and multi-modal datasets.
  • Certification in AWS Machine Learning Specialty or AWS Data Analytics Specialty (preferred).
  • Exposure to annotation in regulated industries (healthcare, finance, retail, government projects) is a plus.

Performance Metrics

  • Annotation Quality: Accuracy and consistency of labeled data.
  • Efficiency: Volume of annotations completed within SLA.
  • Cloud Integration: Seamless delivery of datasets into AWS pipelines.
  • Error Reduction: Continuous improvement of data validation and annotation accuracy.
  • Collaboration: Effective communication with Data Science and Cloud Engineering teams.

Growth Path

  • Senior Data Annotator / Annotation Lead → managing teams of annotators.
  • Data Quality Analyst → leading data validation and audit processes.
  • ML Data Engineer → transitioning into dataset pipeline development roles.
  • AI/ML Specialist on AWS → specializing in automation and scaling of annotation pipelines

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
EXL logo
EXL

Business Process Management / Analytics

New York

RecommendedJobs for You

noida, uttar pradesh, india

noida, uttar pradesh, india

noida, uttar pradesh, india

noida, uttar pradesh, india

chennai, tamil nadu, india

noida, uttar pradesh, india

noida, uttar pradesh, india