Jobs
Interviews

34 Web Crawling Jobs - Page 2

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

5.0 - 7.0 years

6 - 8 Lacs

Kolkata

Remote

Note: Please don't apply if you do not have at least 3 years of Scrapy experience. We are seeking a highly experienced Web Scraping Expert specialising in Scrapy-based web scraping and large-scale data extraction. This role is focused on building and optimizing web crawlers, handling anti-scraping measures, and ensuring efficient data pipelines for structured data collection. The ideal candidate will have 5+ years of hands-on experience developing Scrapy-based scraping solutions, implementing advanced evasion techniques, and managing high-volume web data extraction. You will collaborate with a cross-functional team to design, implement, and optimize scalable scraping systems that deliver high-quality, structured data for critical business needs. Key Responsibilities Scrapy-based Web Scraping Development Develop and maintain scalable web crawlers using Scrapy to extract structured data from diverse sources. Optimize Scrapy spiders for efficiency, reliability, and speed while minimizing detection risks. Handle dynamic content using middlewares, browser-based scraping (Playwright/Selenium), and API integrations. Implement proxy rotation, user-agent switching, and CAPTCHA solving techniques to bypass anti-bot measures. Advanced Anti-Scraping Evasion Techniques Utilize AI-driven approaches to adapt to bot detection and prevent blocks. Implement headless browser automation and request-mimicking strategies to mimic human behavior. Data Processing & Pipeline Management Extract, clean, and structure large-scale web data into structured formats like JSON, CSV, and databases. Optimize Scrapy pipelines for high-speed data processing and storage in MongoDB, PostgreSQL, or cloud storage (AWS S3). Code Quality & Performance Optimization Write clean, well-structured, and maintainable Python code for scraping solutions. Implement automated testing for data accuracy and scraper reliability. Continuously improve crawler efficiency by minimizing IP bans, request delays, and resource consumption. Required Skills and Experience Technical Expertise 5+ years of professional experience in Python development with a focus on web scraping. Proficiency in using Scrapy based scraping Strong understanding of HTML, CSS, JavaScript, and browser behavior. Experience with Docker will be a plus Expertise in handling APIs (RESTful and GraphQL) for data extraction. Proficiency in database systems like MongoDB, PostgreSQL Strong knowledge of version control systems like Git and collaboration platforms like GitHub. Key Attributes Strong problem-solving and analytical skills, with a focus on efficient solutions for complex scraping challenges. Excellent communication skills, both written and verbal. A passion for data and a keen eye for detail Why Join Us? Work on cutting-edge scraping technologies and AI-driven solutions. Collaborate with a team of talented professionals in a growth-driven environment. Opportunity to influence the development of data-driven business strategies through advanced scraping techniques. Competitive compensation and benefits.

Posted 1 month ago

Apply

5.0 - 10.0 years

10 - 20 Lacs

Jaipur

Remote

Summary To enhance user profiling and risk assessment, we are building web crawlers to collect relevant user data from third-party sources, forums, and the dark web. We are seeking a Senior Web Crawler & Data Extraction Engineer to design and implement these data collection solutions. Job Responsibilities Design, develop, and maintain web crawlers and scrapers to extract data from open web sources, forums, marketplaces, and the dark web. Implement data extraction pipelines that aggregate, clean, and structure data for fraud detection and risk profiling. Use Tor, VPNs, and other anonymization techniques to safely crawl the dark web while avoiding detection. Develop real-time monitoring solutions for tracking fraudulent activities, data breaches, and cybercrime discussions. Optimize crawling speed and ensure compliance with website terms of service, ethical standards, and legal frameworks. Integrate extracted data with fraud detection models, risk scoring algorithms, and cybersecurity intelligence tools. Work with data scientists and security analysts to develop threat intelligence dashboards from collected data. Implement anti-bot detection evasion techniques and handle CAPTCHAs using AI-driven solvers where necessary. Stay updated on OSINT (Open-Source Intelligence) techniques, web scraping best practices, and cybersecurity trends. Requirements 5+ years of experience in web crawling, data scraping, or cybersecurity data extraction. Strong proficiency in Python, Scrapy, Selenium, BeautifulSoup, Puppeteer, or similar frameworks. Experience working with Tor, proxies, and VPNs for anonymous web scraping. Deep understanding of HTTP protocols, web security, and bot detection mechanisms. Experience parsing structured and unstructured data from JSON, XML, and web pages. Strong knowledge of database management (SQL, NoSQL) for storing large-scale crawled data. Familiarity with AI/ML-based fraud detection techniques and data classification methods. Experience working with cybersecurity intelligence sources, dark web monitoring, and OSINT tools. Ability to implement scalable, distributed web crawling architectures. Knowledge of data privacy regulations (GDPR, CCPA) and ethical data collection practices. Nice to Have Experience in fintech, fraud detection, or threat intelligence. Knowledge of natural language processing (NLP) for analyzing cybercrime discussions. Familiarity with machine learning-driven anomaly detection for fraud prevention. Hands-on experience with cloud-based big data solutions (AWS, GCP, Azure, Elasticsearch, Kafka).

Posted 1 month ago

Apply

3.0 - 5.0 years

3 - 7 Lacs

Chennai

Work from Office

Job Information Job Opening ID ZR_2129_JOB Date Opened 05/03/2024 Industry Technology Job Type Work Experience 3-5 years Job Title DB developer -Python City Chennai Province Tamil Nadu Country India Postal Code 600001 Number of Positions 5 Good knowledge and in Python, sql, Perl with 6+ years experience. Good problem solving skill. Ability to understand the data and its relations Capability to learn new technologies in short span of time. Should be able to work in Sprint and meet deadlines. Flexible Work time. Mandatory Skills: Python - Basics, Pandas, Web scrapping, File and XML Handling, Extracting/Manipulating -Excel/CSV/Any File Formats.Perl - Basics, CPAN modules, File and Web scrapping/Handling.** Work from option is available check(event) ; career-website-detail-template-2 => apply(record.id,meta)" mousedown="lyte-button => check(event)" final-style="background-color:#2B39C2;border-color:#2B39C2;color:white;" final-class="lyte-button lyteBackgroundColorBtn lyteSuccess" lyte-rendered=""> I'm interested

Posted 1 month ago

Apply

2.0 - 5.0 years

9 - 12 Lacs

Mumbai

Work from Office

Business Unit: Global Technology Reporting To: Director, Head of Enterprise AI Shift: EMEA (1:30 pm - 10:30 pm IST) (India) About Russell Investments, Mumbai: Russell Investments is a leading outsourced financial partner and global investment solutions firm providing a wide range of investment capabilities to institutional investors, financial intermediaries, and individual investors around the world. Building on an 89-year legacy of continuous innovation to deliver exceptional value to clients, Russell Investments works every day to improve the financial security of its clients. The firm is Top 12 Ranked Consultant (2009-2024) in P&I survey 2024 with $906 billion in assets under advisement (as of December 31, 2024) and $331.9 billion in assets under management (as of March 31, 2025) for clients in 30 countries. Headquartered in Seattle, Washington in the United States, Russell Investments has offices around the world, including London, New York, Toronto, Sydney, Tokyo, Shanghai - and has opened a new office in Mumbai, India in June 2023. Joining the Mumbai office is an incredible opportunity to work closely with global stakeholders to support the technology and infrastructure that drives the investment and trading processes of a globally recognized asset management firm. Be part of the team based out of Goregaon (East) and contribute to the foundation and culture of the firm s growing operations in India. The Mumbai office operates with varying shifts to accommodate time zones around the world. For more information, please visit https: / / www.russellinvestments.com . Job Description: Role Summary This role is responsible for supporting and growing the AI strategy, platform and deliverables at Russell. We are looking for a curious and analytical individual who will research, develop, implement, and maintain processes to meet the needs of our AI strategy and deliver on business objectives. This is an excellent opportunity to take advantage of emerging trends and technologies and make a real-world difference. Years of Experience Suitable candidates would have 2 - 5 years of programming/artificial intelligence experience along with some knowledge machine learning. Qualifications Bachelors degree in Computer Science, Engineering, Finance, Economics, Statistics, or a related field. Advanced degree preferred. Proficient in Python and SQL (R or C# a plus) Exposure to TensorFlow, PyTorch and NLP Techniques. Proven experience in developing Generative AI applications Strong experience with Selenium, Beautiful Soup and/or other web crawling techniques Experience working with large-scale datasets for speech, video and text. Familiarity with Whisper models, speech-to-text, video intelligence, and chatbot frameworks. Experience with DevOps toolkit is a plus Strong analytical skill set with the ability to analyze complex data. Ability to read, analyze and interpret financial reports, tax documents, etc. Excellent problem-solving and debugging skills. Ability to work collaboratively in a fast-paced environment. Responsibilities Support and develop our Python code infrastructure Design and implement AI-powered speech and video processing solutions. Develop and optimize deep learning models for speech recognition, language modeling, and computer vision. Improve chatbot capabilities by integrating multimodal AI components. Create RAG workflows to ensure seamless AI tool integration. Stay updated with the latest AI research and bring innovative ideas to the team. Document workflows, models, and AI applications to ensure scalability and reproducibility. Work closely with business units to understand projects requirements and deliver solutions that meet business objectives Troubleshoot, debug and optimize code to ensure high performance and reliability of AI applications Stay abreast of the latest developments in AI, integrating new technologies into projects as appropriate Stay familiar with ethical AI and web scraping principles Core Values Strong interpersonal, oral, and written communication and collaboration skills with all levels of management . Strong organizational skills including the ability to adapt to shifting priorities and meet frequent deadlines. Demonstrated proactive approach to problem-solving with strong judgment and decision-making capability. Highly resourceful and collaborative team-player, with the ability to also be independently effective and exude initiative and a sense of urgency. Exemplifies our customer-focused, action-oriented, results-driven culture. Forward looking thinker, who actively seeks opportunities, has a desire for continuous learning, and proposes solutions. Ability to act with discretion and maintain complete confidentiality. Dedicated to the firm s values of non-negotiable integrity, valuing our people, exceeding client expectations, and embracing intellectual curiosity and rigor.

Posted 1 month ago

Apply

7.0 - 12.0 years

12 - 22 Lacs

Bengaluru

Remote

Role & responsibilities As a Data Engineer focused on web crawling and platform data acquisition, you will design, develop, and maintain large-scale web scraping pipelines to extract valuable platform data. You will be responsible for implementing scalable and resilient data extraction solutions, ensuring seamless data retrieval while working with proxy management, anti-bot bypass techniques, and data parsing. Optimizing scraping workflows for performance, reliability, and efficiency will be a key part of your role. Additionally, you will ensure that all extracted data maintains high quality and integrity. Preferred candidate profile We are seeking candidates with: Strong experience in Python and web scraping frameworks such as Scrapy, Selenium, Playwright, or BeautifulSoup. Knowledge of distributed web crawling architectures and job scheduling. Familiarity with headless browsers, CAPTCHA-solving techniques, and proxy management to handle dynamic web challenges. Experience with data storage solutions, including SQL, and cloud storage. Understanding of big data technologies like Spark and Kafka (a plus). Strong debugging skills to adapt to website structure changes and blockers. A proactive, problem-solving mindset and ability to work effectively in a team-driven environment.

Posted 1 month ago

Apply

7.0 - 11.0 years

12 - 19 Lacs

Bengaluru

Work from Office

Responsibilities:As a Data Engineer focused on web crawling and platform data acquisition, you will design, develop, and maintain large-scale web scraping pipelines to extract valuable platform data. Annual bonus Health insurance Provident fund

Posted 2 months ago

Apply

3 - 6 years

6 - 10 Lacs

Noida

Work from Office

Python Developer Location: Sector-1, Noida (Work from Office) Experience: Minimum 3 years Education: B.E./B.Tech Primary Role: Responsible for performing web scraping and crawling to extract and structure data from various websites. Handle data cleaning, transformation, and storage in structured formats. Write efficient and scalable Python scripts to manage high-volume data extraction tasks. Monitor and manage log files using automation scripts. Key Skills: Proficiency in Python with hands-on experience in web scraping and crawling . Strong working knowledge of BeautifulSoup , Selenium , NumPy , Pandas , and Pytest . Good understanding of JavaScript , HTML , and SQL (preferably MS SQL ). Experience with MongoDB is an added advantage. Ability to integrate multiple data sources and databases into a single pipeline. Solid understanding of: Python threading and multiprocessing Event-driven programming Scalable and modular application design Preferred Skills: Practical experience in writing and maintaining web crawlers and scrapers . Familiarity with anti-bot mechanisms and techniques to bypass them responsibly. Exposure to handling large datasets and ensuring data accuracy and completeness. Experience with automated testing using Pytest .

Posted 2 months ago

Apply

5 - 8 years

15 - 25 Lacs

Pune

Hybrid

Role & responsibilities Data Pipeline Development: Design, develop, and maintain data pipelines utilizing Google Cloud Platform (GCP) services like Dataflow, Dataproc, and Pub/Sub. Data Ingestion & Transformation: Build and implement data ingestion and transformation processes using tools such as Apache Beam and Apache Spark. Data Storage Management: Optimize and manage data storage solutions on GCP, including BigQuery, Cloud Storage, and Cloud SQL. Security Implementation: Implement data security protocols and access controls with GCP's Identity and Access Management (IAM) and Cloud Security Command Center. System Monitoring & Troubleshooting: Monitor and troubleshoot data pipelines and storage solutions using GCP's Stackdriver and Cloud Monitoring tools. Generative AI Systems: Develop and maintain scalable systems for deploying and operating generative AI models, ensuring efficient use of computational resources. Gen AI Capability Building: Build generative AI capabilities among engineers, covering areas such as knowledge engineering, prompt engineering, and platform engineering. Knowledge Engineering: Gather and structure domain-specific knowledge to be utilized by large language models (LLMs) effectively. Prompt Engineering: Design effective prompts to guide generative AI models, ensuring relevant, accurate, and creative text output. Collaboration: Work with data experts, analysts, and product teams to understand data requirements and deliver tailored solutions. Automation: Automate data processing tasks using scripting languages such as Python. Best Practices: Participate in code reviews and contribute to establishing best practices for data engineering within GCP. Continuous Learning: Stay current with GCP service innovations and advancements. Core data services (GCS, BigQuery, Cloud Storage, Dataflow, etc.). Skills and Experience: Experience: 5+ years of experience in Data Engineering or similar roles. Proficiency in GCP: Expertise in designing, developing, and deploying data pipelines, with strong knowledge of GCP core data services (GCS, BigQuery, Cloud Storage, Dataflow, etc.). Generative AI & LLMs: Hands-on experience with Generative AI models and large language models (LLMs) such as GPT-4, LLAMA3, and Gemini 1.5, with the ability to integrate these models into data pipelines and processes. Experience in Webscraping Technical Skills: Strong proficiency in Python and SQL for data manipulation and querying. Experience with distributed data processing frameworks like Apache Beam or Apache Spark is a plus. Security Knowledge: Familiarity with data security and access control best practices. • Collaboration: Excellent communication and problem-solving skills, with a demonstrated ability to collaborate across teams. Project Management: Ability to work independently, manage multiple projects, and meet deadlines. Preferred Knowledge: Familiarity with Sustainable Finance, ESG Risk, CSRD, Regulatory Reporting, cloud infrastructure, and data governance best practices. Bonus Skills: Knowledge of Terraform is a plus. Education: Degree: Bachelors or masters degree in computer science, Information Technology, or a related field. Experience: 3-5 years of hands-on experience in data engineering. Certification: Google Professional Data Engineer

Posted 2 months ago

Apply

2 - 4 years

5 - 12 Lacs

Nagpur, Pune, Mumbai (All Areas)

Work from Office

Role & responsibilities Job Overview: We are looking for a highly motivated Junior Data Engineer with a passion for web scraping and web crawling to join our team. The ideal candidate will have strong Python programming skills and experience with web scraping frameworks and libraries like Requests, BeautifulSoup, Selenium, Playwright or URLlib. You will be responsible for building efficient and scalable web scrapers, extracting valuable data, and ensuring data integrity. This role requires a keen eye for problem-solving, the ability to work with complex data structures, and a strong understanding of web technologies like HTML, CSS, DOM, XPATH, and Regular Expressions. Knowledge of JavaScript would be an added advantage. Responsibilities: As a Web Scraper, your role is to apply your knowledge set to fetch data from multiple online sources Developing highly reliable web Scraper and parsers across various websites Extract structured/unstructured data and store them into SQL/No SQL data store Work closely with Project/Business/Research teams to provide scrapped data for analysis Maintain the scraping projects delivered to production Develop frameworks for automating and maintaining constant flow of data from multiple sources Work independently with minimum supervision Develop a deep understanding of the data sources on the web and know exactly how, when, and which data to scrap, parse and store this data Required Skills and Experience: Experience as Web Scraper of 1 to 2 years. Proficient knowledge in Python language and working knowledge of Web Crawling/Web scraping in Python Requests, Beautifulsoup or URLlib and Selenium, Playwright. Must possess strong knowledge of basic Linux commands for system navigation, management, and troubleshooting. Must have expertise in proxy usage to ensure secure and efficient network operations. Must have experience with captcha-solving techniques for seamless automation and data extraction. Experience with data parsing - Strong knowledge of Regular expression, HTML, CSS, DOM, XPATH. Knowledge of Javascript would be a plus Preferred candidate profile Must be able to access, manipulate, and transform data from a variety of database and flat file sources. MongoDB & MYSQL skills are essential. • Must possess strong knowledge of basic Linux commands for system navigation, management, and troubleshooting. • Must be able to develop reusable code-based scraping products which can be used by others. • GIT knowledge is mandatory for version control and collaborative development workflows. • Must have experience handling cloud servers on platforms like AWS, GCP, and LEAPSWITCH for scalable and reliable infrastructure management. • Ability to ask the right questions and deliver the right results in a way that is understandable and usable to your clients. • A track record of digging in to the tough problems, attacking them from different angles, and bringing innovative approaches to bear is highly desirable. Must be capable of selfteaching new techniques. Behavioural expectations: • Be excited by and have positive outlook to navigate ambiguity • Passion for results and excellence • Team player • Must be able to get the job done by working collaboratively with others • Be inquisitive and an analytical mind; out-of-the-box thinking • Prioritize among competing opportunities, balance consumer needs with business and product priorities, and clearly articulate the rationale behind product decisions • Straightforward and professional • Good communicator • Maintain high energy and motivate • A do-it-yourself orientation, consistent with the companys roll-up the- sleeves culture • Proactive

Posted 2 months ago

Apply
Page 2 of 2
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies