0 - 5 years

0 - 1 Lacs

Posted:2 weeks ago| Platform: Indeed logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Job description:

We’re looking for a hands-on Data Engineer to manage and scale our data scraping pipelines across 60+ websites. The job involves handling OCR-processed PDFs, ensuring data quality, and building robust, self-healing workflows that fuel AI-driven insights.

You’ll Work On:

Managing and optimizing Airflow scraping DAGs

Implementing validation checks, retry logic & error alerts

Cleaning and normalizing OCR text (Tesseract / AWS Textract)

Handling deduplication, formatting, and missing data

Maintaining MySQL/PostgreSQL data integrity

Collaborating with ML engineers on downstream pipelines

What You Bring:

2–5 years of hands-on experience in Python data engineering

Experience with Airflow, Pandas, and OCR tools

Solid SQL skills and schema design (MySQL/PostgreSQL)

Comfort with CSVs and building ETL pipelines

Required:

1. Scrapy or Selenium experience

2. CAPTCHAs handling

3. Experience in PyMuPDF, Regex

4. AWS S3

5. LangChain, LLM, Fast API

6. Streamlit

7. Matplotlib

Job Type: Full-time

Day shift

Work Location: In person

Job Type: Full-time

Pay: ₹70,000.00 - ₹150,000.00 per month

Application Question(s):

  • Total years of experience in web scraping / data extraction
  • Have you worked with large-scale data pipelines?
  • Are you proficient in writing complex Regex patterns for data extraction and cleaning?
  • Have you implemented or managed data pipelines using tools like Apache Airflow?
  • Years of experience with PDF Parsing and using OCR tools (e.g., Tesseract, Google Document AI, AWS Textract, etc.)
  • 6. Years of experience handling complex PDF tables with merged rows, rotated layouts, or inconsistent formatting
  • Are you willing to relocate to Delhi if selected?
  • Current CTC
  • Expected CTC

Work Location: In person

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You

gurugram, haryana, india

noida, uttar pradesh, india

hyderabad, telangana, india

noida, uttar pradesh, india

gurugram, haryana, india