Data Engineer - Web Scrapping & Enrichment

2 - 6 years

0 Lacs

Posted:3 days ago| Platform: Shine logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

As an entrepreneurial, passionate, and driven Data Engineer at Startup Gala Intelligence backed by Navneet Tech Venture, you will play a crucial role in shaping the technology vision, architecture, and engineering culture of the company right from the beginning. Your contributions will be foundational in developing best practices and establishing the engineering team. **Key Responsibilities:** - **Web Scraping & Crawling:** Build and maintain automated scrapers to extract structured and unstructured data from websites, APIs, and public datasets. - **Scalable Scraping Systems:** Develop multi-threaded, distributed crawlers capable of handling high-volume data collection without interruptions. - **Data Parsing & Cleaning:** Normalize scraped data, remove noise, and ensure consistency before passing to data pipelines. - **Anti-bot & Evasion Tactics:** Implement proxy rotation, captcha solving, and request throttling techniques to handle scraping restrictions. - **Integration with Pipelines:** Deliver clean, structured datasets into NoSQL stores and ETL pipelines for further enrichment and graph-based storage. - **Data Quality & Validation:** Ensure data accuracy, deduplicate records, and maintain a trust scoring system for data confidence. - **Documentation & Maintenance:** Keep scrapers updated when websites change, and document scraping logic for reproducibility. **Qualifications Required:** - 2+ years of experience in web scraping, crawling, or data collection. - Strong proficiency in Python (libraries like BeautifulSoup, Scrapy, Selenium, Playwright, Requests). - Familiarity with NoSQL databases (MongoDB, DynamoDB) and data serialization formats (JSON, CSV, Parquet). - Experience in handling large-scale scraping with proxy management and rate-limiting. - Basic knowledge of ETL processes and integration with data pipelines. - Exposure to graph databases (Neo4j) is a plus. As part of Gala Intelligence, you will be working in a tech-driven startup dedicated to solving fraud detection and prevention challenges. The company values transparency, collaboration, and individual ownership, creating an environment where talented individuals can thrive and contribute to impactful solutions. If you are someone who enjoys early-stage challenges, thrives on owning the entire tech stack, and is passionate about building innovative, scalable solutions, we encourage you to apply. Join us in leveraging technology to combat fraud and make a meaningful impact from day one.,

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You