Web Crawler Engineer

3 years

0 Lacs

Posted:2 days ago| Platform: Linkedin logo

Apply

Work Mode

Remote

Job Type

Full Time

Job Description

Location: [Remote / India]

Job Type: [Full-time]

Experience: 3+ years in web crawling/scraping, backend systems, and data extraction


About Client

Our Client is a modern, meaning‑based web-search API designed specifically for AI applications—such as retrieval‑augmented generation (RAG). Unlike traditional keyword-based engines, they use embedding-based semantic search, allowing developers to fetch content that’s contextually relevant and up-to-date


About the Role:

We are looking for a skilled Web Crawler Engineer to design, develop, and maintain scalable web crawling and scraping systems. The ideal candidate should be well-versed in handling large-scale data extraction, parsing unstructured web data, dealing with anti-bot mechanisms, and managing crawling infrastructure.

We are looking for someone who has done web crawling before. Must be able to work a lot. Crawl 3M URLs per hour to add to the in-house index. Must be good/interested in high-performance engineering as the company scales its vector DB. Unlimited compute for you to do the biggest web crawling push in your career.


Key Responsibilities:
  • Develop robust, scalable, and efficient web crawlers to extract structured/unstructured data from dynamic websites.
  • Design and implement data pipelines to process, clean, and store scraped data in databases or data lakes.
  • Monitor and maintain crawling systems to ensure reliability, data accuracy, and performance.
  • Handle anti-bot measures (e.g., CAPTCHAs, IP blocks, dynamic content loading) using techniques like headless browsing, proxies, and rotating user agents.
  • Ensure compliance with site-specific terms of service and data privacy policies.
  • Collaborate with data scientists, backend engineers, and product managers to support business goals through reliable data feeds.


Required Skills:
  • Strong experience with web scraping tools/frameworks (e.g., Scrapy, Puppeteer, Selenium, Playwright).
  • Proficiency in Python is essential; familiarity with JavaScript or Go is a plus.
  • Hands-on experience in parsing HTML/XML/JSON using BeautifulSoup, lxml, or similar libraries.
  • Minimum 3 years of experience with headless browsers and automation tools (e.g., Puppeteer or Playwright).
  • Good understanding of networking, HTTP protocols, headers, cookies, and sessions.
  • Familiarity with databases (SQL or NoSQL – e.g., PostgreSQL, MongoDB, Elasticsearch).
  • Experience using proxies, VPNs, and user-agent rotation to bypass crawling limitations.
  • Familiarity with task queues and schedulers (e.g., Celery, Airflow, Cron).
  • Understanding of cloud services (AWS, GCP, or Azure) and containerization tools (Docker, Kubernetes) is a plus.


Preferred Qualifications:
  •  Bachelor’s/Master’s degree in Computer Science, Engineering, or related field.
  • Experience handling large-scale crawls (millions of pages per day).
  • Knowledge of ethical scraping practices and legal considerations (e.g., robots.txt, GDPR).
  • Exposure to data pipelines and distributed systems (e.g., Kafka, Spark).


Tools & Technologies (Nice to Have):
  • Scrapy, Selenium, Puppeteer, Playwright
  • Python, JavaScrip
  • Beautiful Soup, lxml
  • Redis, Kafka, PostgreSQL, MongoDB
  • AWS/GCP, Docker, Git
  • Airflow, Jenkins


What We Offer:
  • Competitive salary and performance-based incentives
  • Opportunity to work on impactful data engineering problems
  • Flexible work hours and remote-first culture
  • Learning and development allowance
  • Collaborative and inclusive team environment


Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You