We're looking for a data-obsessed explorer who can build and maintain pipelines that collect, clean, and enhance large volumes of data, then apply AI tools to keep it accurate, useful, and ready for analysis. This is initially a

project-based role

with the possibility of evolving into a

full-time contract

based on performance and business needs.

Key Responsibilities

Data Acquisition & Scraping

Design, develop, and maintain scalable web-scraping systems and APIs to collect structured and unstructured data from diverse sources.
Ensure compliance with data privacy laws (GDPR, CCPA) and site-specific terms of service.

Data Enrichment & Transformation

Implement pipelines to clean, normalize, and enrich raw data using third-party datasets, NLP (natural language processing), and machine learning techniques.
Build automated matching and deduplication processes to maintain a unified source of truth.

Quality Assurance & Monitoring

Create automated QA checks to validate data accuracy, completeness, and consistency.
Set up monitoring and alert systems to catch anomalies or pipeline failures early.

AI & Process Optimization

Integrate AI models for entity extraction, text classification, and predictive enrichment.
Work with the data science team to design features that feed analytics and machine learning models.

Collaboration & Documentation

Partner with product, engineering, and analytics teams to define data requirements and priorities.
Maintain clear technical documentation and data lineage records

Requirements

Strong programming skills in Python (Scrapy, BeautifulSoup, Selenium, Playwright) or equivalent languages.
Experience with data pipelines and ETL tools (Airflow, Prefect, or similar).
Proficiency in SQL/NoSQL databases and data warehousing (e.g., BigQuery, Snowflake).
Familiarity with cloud platforms (AWS, GCP, or Azure) and containerization (Docker/Kubernetes).
Knowledge of machine learning workflows and libraries (scikit-learn, spaCy, Hugging Face) is a big plus.
Solid understanding of data privacy and ethical data collection practices.

Nice-to-Have

Experience with LLMs (large language models) for text enrichment.
Background in data visualization or BI tools (Tableau, Looker, Power BI).
Familiarity with real-time streaming data (Kafka, Kinesis).

Traits for Success

Detail-oriented with a knack for spotting hidden data issues.
Curious problem solver who loves automation and efficiency.
Comfortable in a fast-paced environment where requirements evolve quickly

Benefits

Remote work.Flexible work schedule .Opportunity for a long term contract .

More Jobs at Division50

Data & AI Specialist - Data Scraping, Enrichment & Quality Assurance

india

Experience: Not specified

Salary: Not disclosed

Remote Sales Representative

india

Experience: Not specified

USD 0 - 0 Lacs

Data & AI Specialist - Data Scraping, Enrichment & Quality Assurance

india

Experience: Not specified

Salary: Not disclosed

Senior Full-Stack / AI Systems Engineer

india

Experience: Not specified

USD 0 - 0 Lacs

Senior Full-Stack / AI Systems Engineer

india

5.0 - 7.0 yrs

Salary: Not disclosed

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.