Jobs
Interviews

110 Web Scraping Jobs - Page 5

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

2.0 - 4.0 years

4 - 6 Lacs

Mumbai, Hyderabad

Work from Office

Job Responsibilities. Collaborate with data scientists, software engineers, and business stakeholders to understand data requirements and design efficient data models.. Develop, implement, and maintain robust and scalable data pipelines, ETL processes, and data integration solutions.. Extract, transform, and load data from various sources, ensuring data quality, integrity, and consistency.. Optimize data processing and storage systems to handle large volumes of structured and unstructured data efficiently.. Perform data cleaning, normalization, and enrichment tasks to prepare datasets for analysis and modelling.. Monitor data flows and processes, identify and resolve data-related issues and bottlenecks.. Contribute to the continuous improvement of data engineering practices and standards within the organization.. Stay up-to-date with industry trends and emerging technologies in data engineering, artificial intelligence, and dynamic pricing. Candidate Profile. Strong passion for data engineering, artificial intelligence, and problem-solving.. Solid understanding of data engineering concepts, data modeling, and data integration techniques.. Proficiency in programming languages such as Python, SQL and Web Scrapping.. Understanding of databases like No Sql , relational database, In Memory database and technologies like MongoDB, Redis, Apache Spark would be add on... Knowledge of distributed computing frameworks and big data technologies (e.g., Hadoop, Spark) is a plus.. Excellent analytical and problem-solving skills, with a keen eye for detail.. Strong communication and collaboration skills, with the ability to work effectively in a teamoriented environment.. Self-motivated, quick learner, and adaptable to changing priorities and technologies.. (ref:hirist.tech).

Posted 2 months ago

Apply

3 - 6 years

20 - 27 Lacs

Pune

Remote

Data Acquisition & Web Application Developer Experience: 3 - 6 Years Exp Salary : USD 1,851-2,962 / month Preferred Notice Period : Within 30 Days Shift : 10:00AM to 7:00PM IST Opportunity Type: Remote Placement Type: Permanent (*Note: This is a requirement for one of Uplers' Clients) Must have skills required : APIS, data acquisition, Web scraping, Agile, Python Good to have skills : Analytics, Monitoring, stream processing, Web application deployment, Node Js GPRO Ltd (One of Uplers' Clients) is Looking for: Data Acquisition & Web Application Developer who is passionate about their work, eager to learn and grow, and who is committed to delivering exceptional results. If you are a team player, with a positive attitude and a desire to make a difference, then we want to hear from you. Role Overview Description Job Title: Data Acquisition & Web Application Developer About the Project: We are seeking a skilled full-stack developer to build a specialised web application designed to aggregate and present public information on individuals, such as company executives and leaders. This tool will serve as a comprehensive profile generator, pulling data from diverse online sources including news outlets, social media and other platforms. The primary goal is to provide users with a centralised, easily navigable view of a person's online presence, latest news and public information. Project Overview: The core of this project involves developing a robust data acquisition layer capable of scraping and integrating information from various online sources. This data will then be presented through a user-friendly web interface. The application should allow users to input a person's name and receive an aggregated view of relevant public data. Key Responsibilities: Develop and Implement Data Acquisition Layer: Design and build systems to scrape and collect data from specified sources, including news websites (e.g., Bloomberg.com, Reuters, BBC.com, Financial Times), social media (e.g., X, LinkedIn), and media platforms (e.g., YouTube, podcasts). Integrate with APIs: Utilize official APIs (e.g., Bloomberg data, Reuters, FinancialTimes, Google Finance) where available and prioritized. Evaluate and integrate with third-party scraping APIs (e.g., Apify, Oxylabs, SerpApi) as necessary, considering associated risks and subscription models. Handle Hybrid Approach: Implement a strategy that leverages licensed APIs for premium sources while potentially using third-party scrapers for others, being mindful of terms of service and legal/ethical considerations. Direct scraping of highly protected sites like Bloomberg, Reuters, and FT should be avoided or approached with extreme caution using third-party services. Design Data Storage and Indexing: Determine appropriate data storage solutions, considering the volume of data and its relevance over time. Implement indexing and caching mechanisms to ensure efficient search and retrieval of information, supporting near real-time data presentation. Develop Web Application Front-End: Build a basic, functional front-end interface similar to the provided examples ("Opening Screen," "Person profile"). This includes displaying the aggregated information clearly. Implement User Functionality: Enable users to: Input a person's name for searching. Sort displayed outputs by date. Click through links to access the original source of information. Navigate to a new search easily (e.g., via a tab). Consider Stream Processing: Evaluate and potentially implement stream processing techniques for handling near real-time data acquisition and updates. ¢ Ensure Scalability: Design the application to support a specified level of concurrent searches (estimated at 200 for the initial phase). ¢ Build Business Informational Layer: Develop a component that tracks the usage of different data services (APIs, scrapers) for monitoring costs and informing future scaling decisions. ¢ Technical Documentation: Provide clear documentation for the developed system, including data flows, API integrations, and deployment notes. Required Skills and Experience: ¢ Proven experience in web scraping and data acquisition from diverse online sources. ¢ Strong proficiency in developing with APIs, including handling different authentication methods and data formats. ¢ Experience with relevant programming languages and frameworks for web development and data processing (e.g., Python, Node.js, etc.). ¢ Knowledge of database design and data storage solutions. ¢ Familiarity with indexing and caching strategies for search applications. ¢ Understanding of potential challenges in web scraping (e.g., anti-scraping measures, terms of service). ¢ Experience in building basic web application front-ends. ¢ Ability to consider scalability and performance in system design. ¢ Strong problem-solving skills and ability to work independently or as part of a small team. ¢ Experience working with foreign (western based) startups and clients. Ability to work in agile environments and ability to pivot fast. Desirable Skills: ¢ Experience with stream processing technologies. ¢ Familiarity with deploying and managing web applications (though infrastructure design is flexible). ¢ Experience with monitoring and analytics for application usage. How to apply for this opportunity: Easy 3-Step Process: 1. Click On Apply! And Register or log in on our portal 2. Upload updated Resume & Complete the Screening Form 3. Increase your chances to get shortlisted & meet the client for the Interview! About Our Client: A web app aggregating real-time info on individuals for financial services professionals About Uplers: Our goal is to make hiring and getting hired reliable, simple, and fast. Our role will be to help all our talents find and apply for relevant product and engineering job opportunities and progress in their career. (Note: There are many more opportunities apart from this on the portal.) So, if you are ready for a new challenge, a great work environment, and an opportunity to take your career to the next level, don't hesitate to apply today. We are waiting for you!

Posted 2 months ago

Apply

3 - 6 years

6 - 10 Lacs

Noida

Work from Office

Python Developer Location: Sector-1, Noida (Work from Office) Experience: Minimum 3 years Education: B.E./B.Tech Primary Role: Responsible for performing web scraping and crawling to extract and structure data from various websites. Handle data cleaning, transformation, and storage in structured formats. Write efficient and scalable Python scripts to manage high-volume data extraction tasks. Monitor and manage log files using automation scripts. Key Skills: Proficiency in Python with hands-on experience in web scraping and crawling . Strong working knowledge of BeautifulSoup , Selenium , NumPy , Pandas , and Pytest . Good understanding of JavaScript , HTML , and SQL (preferably MS SQL ). Experience with MongoDB is an added advantage. Ability to integrate multiple data sources and databases into a single pipeline. Solid understanding of: Python threading and multiprocessing Event-driven programming Scalable and modular application design Preferred Skills: Practical experience in writing and maintaining web crawlers and scrapers . Familiarity with anti-bot mechanisms and techniques to bypass them responsibly. Exposure to handling large datasets and ensuring data accuracy and completeness. Experience with automated testing using Pytest .

Posted 2 months ago

Apply

5 - 8 years

15 - 25 Lacs

Pune

Hybrid

Role & responsibilities Data Pipeline Development: Design, develop, and maintain data pipelines utilizing Google Cloud Platform (GCP) services like Dataflow, Dataproc, and Pub/Sub. Data Ingestion & Transformation: Build and implement data ingestion and transformation processes using tools such as Apache Beam and Apache Spark. Data Storage Management: Optimize and manage data storage solutions on GCP, including BigQuery, Cloud Storage, and Cloud SQL. Security Implementation: Implement data security protocols and access controls with GCP's Identity and Access Management (IAM) and Cloud Security Command Center. System Monitoring & Troubleshooting: Monitor and troubleshoot data pipelines and storage solutions using GCP's Stackdriver and Cloud Monitoring tools. Generative AI Systems: Develop and maintain scalable systems for deploying and operating generative AI models, ensuring efficient use of computational resources. Gen AI Capability Building: Build generative AI capabilities among engineers, covering areas such as knowledge engineering, prompt engineering, and platform engineering. Knowledge Engineering: Gather and structure domain-specific knowledge to be utilized by large language models (LLMs) effectively. Prompt Engineering: Design effective prompts to guide generative AI models, ensuring relevant, accurate, and creative text output. Collaboration: Work with data experts, analysts, and product teams to understand data requirements and deliver tailored solutions. Automation: Automate data processing tasks using scripting languages such as Python. Best Practices: Participate in code reviews and contribute to establishing best practices for data engineering within GCP. Continuous Learning: Stay current with GCP service innovations and advancements. Core data services (GCS, BigQuery, Cloud Storage, Dataflow, etc.). Skills and Experience: Experience: 5+ years of experience in Data Engineering or similar roles. Proficiency in GCP: Expertise in designing, developing, and deploying data pipelines, with strong knowledge of GCP core data services (GCS, BigQuery, Cloud Storage, Dataflow, etc.). Generative AI & LLMs: Hands-on experience with Generative AI models and large language models (LLMs) such as GPT-4, LLAMA3, and Gemini 1.5, with the ability to integrate these models into data pipelines and processes. Experience in Webscraping Technical Skills: Strong proficiency in Python and SQL for data manipulation and querying. Experience with distributed data processing frameworks like Apache Beam or Apache Spark is a plus. Security Knowledge: Familiarity with data security and access control best practices. • Collaboration: Excellent communication and problem-solving skills, with a demonstrated ability to collaborate across teams. Project Management: Ability to work independently, manage multiple projects, and meet deadlines. Preferred Knowledge: Familiarity with Sustainable Finance, ESG Risk, CSRD, Regulatory Reporting, cloud infrastructure, and data governance best practices. Bonus Skills: Knowledge of Terraform is a plus. Education: Degree: Bachelors or masters degree in computer science, Information Technology, or a related field. Experience: 3-5 years of hands-on experience in data engineering. Certification: Google Professional Data Engineer

Posted 2 months ago

Apply

3 - 7 years

7 - 17 Lacs

Ahmedabad

Work from Office

What we are looking for The ideal candidate will possess hands-on expertise in designing and deploying advanced web scraping solutions, leveraging Node.js and other technologies. A significant focus will be on overcoming bot detection challenges, building scalable and resilient scraping systems, and ensuring the efficiency and scalability of data acquisition pipelines. This is a highly technical, hands-on role ideal for someone passionate about solving complex scraping and infrastructure challenges. Things you will be doing Advanced Web Scraping: Develop and maintain high-performance scraping systems using Node.js, Python, or other relevant technologies. Handle JavaScript-heavy and asynchronous content using tools like Puppeteer, Playwright, or custom solutions in Node.js. Implement advanced bot detection bypass techniques, including: CAPTCHA solving using automation, AI/ML, or third-party services. Advanced proxy management and IP rotation strategies. User-agent, cookie, and header spoofing. Build robust error-handling mechanisms to adapt to changes in website structures or anti-scraping measures. Bot Detection and Anti-Scraping Expertise: Analyze and reverse-engineer advanced bot detection systems and anti-scraping mechanisms, including rate-limiting, behavioral analysis, and fingerprinting. Design and implement techniques to bypass WAFs (Web Application Firewalls) and server-side protections using Node.js libraries and tools. Monitor, log, and analyze bot detection patterns to ensure system adaptability. Create innovative solutions to blend scraping traffic with legitimate user behavior. Infrastructure and Networking: Architect and maintain scalable infrastructure using containerization tools like Docker and orchestration platforms such as Kubernetes. Leverage cloud platforms (AWS, GCP, Azure) for distributed scraping and data acquisition. Utilize Node.js and related tools to optimize network configurations for high-throughput scraping, including proxy and load balancer configurations. Automate deployment and scaling of scraping systems using CI/CD pipelines. Performance and Optimization: Ensure optimal performance of scraping systems by reducing latency and optimizing resource utilization. Develop robust monitoring and logging systems to track and troubleshoot issues in real time. Optimize pipelines for scalability, fault tolerance, and high availability. Compliance and Security: Ensure adherence to legal, ethical, and regulatory standards (e.g., GDPR, CCPA) for all scraping activities. Safeguard data acquisition systems from detection, blocking, and external threats. Respect website terms of service while implementing efficient scraping solutions. Skills you need in order to succeed in this role Technical Skills: 3+ years of hands-on experience in web scraping or data engineering. Expertise in Node.js for building and optimizing scraping systems. Deep expertise in handling advanced bot detection systems and anti-scraping mechanisms. Strong knowledge of programming languages such as Python and JavaScript. Advanced understanding of networking concepts, including HTTP/HTTPS protocols, WebSockets, DNS, and API integrations. Experience with containerization tools (Docker) and orchestration platforms (Kubernetes). Proficiency in cloud platforms (AWS, GCP, Azure) for scalable data acquisition pipelines. Familiarity with tools like Puppeteer, Playwright, Scrapy, or Selenium. Problem-Solving Expertise: Proven ability to reverse-engineer anti-bot measures such as CAPTCHA, IP blocks, and fingerprinting. Strong debugging and optimization skills for network and scraping pipelines.

Posted 2 months ago

Apply

4 - 8 years

10 - 15 Lacs

Nagpur, Pune, Mumbai (All Areas)

Work from Office

Job Overview: The ideal candidate will have strong Python programming skills and experience with web scraping frameworks and libraries like Requests, BeautifulSoup, Selenium, Playwright or URLlib. You will be responsible for building efficient and scalable web scrapers, extracting valuable data, and ensuring data integrity. This role requires a keen eye for problem-solving, the ability to work with complex data structures, and a strong understanding of web technologies like HTML, CSS, DOM, XPATH, and Regular Expressions. Knowledge of JavaScript would be an added advantage. Responsibilities: • As a Web Scraper, your role is to apply your knowledge set to fetch data from multiple • online sources • Developing highly reliable web Scraper and parsers across various websites • Extract structured/unstructured data and store them into SQL/No SQL data store • Work closely with Project/Business/Research teams to provide scrapped data for analysis • Maintain the scraping projects delivered to production • Develop frameworks for automating and maintaining constant flow of data from multiple • sources • Work independently with minimum supervision • Develop a deep understanding of the data sources on the web and know exactly how, when, and which data to scrap, parse and store this data Required Skills and Experience: • Experience as Web Scraper of 1 to 2 years. • Proficient knowledge in Python language and working knowledge of Web Crawling/Web scraping in Python Requests, Beautifulsoup or URLlib and Selenium, Playwright. • Must possess strong knowledge of basic Linux commands for system navigation, management, and troubleshooting. • Must have expertise in proxy usage to ensure secure and efficient network operations. • Must have experience with captcha-solving techniques for seamless automation and data extraction. • Experience with data parsing - Strong knowledge of Regular expression, HTML, CSS, DOM, XPATH. Knowledge of Javascript would be a plus SLO Technologies Private Limited Registered Office Address: IQS Tower, 5th Floor, Baner Road, Baner, Pune Maharashtra 411045 CIN: U74120MH2015PTC267292 Phone: 7900151368/ 8652865168 Email: info@advarisk.com Website: www.advarisk.com • Must be able to access, manipulate, and transform data from a variety of database and flat file sources. MongoDB & MYSQL skills are essential. • Must possess strong knowledge of basic Linux commands for system navigation, management, and troubleshooting. • Must be able to develop reusable code-based scraping products which can be used by others. • GIT knowledge is mandatory for version control and collaborative development workflows. • Must have experience handling cloud servers on platforms like AWS, GCP, and LEAPSWITCH for scalable and reliable infrastructure management. • Ability to ask the right questions and deliver the right results in a way that is understandable and usable to your clients. • A track record of digging in to the tough problems, attacking them from different angles, and bringing innovative approaches to bear is highly desirable. Must be capable of selfteaching new techniques. Behavioural expectations: • Be excited by and have positive outlook to navigate ambiguity • Passion for results and excellence • Team player • Must be able to get the job done by working collaboratively with others • Be inquisitive and an analytical mind; out-of-the-box thinking • Prioritize among competing opportunities, balance consumer needs with business and product priorities, and clearly articulate the rationale behind product decisions • Straightforward and professional • Good communicator • Maintain high energy and motivate • A do-it-yourself orientation, consistent with the companys roll-up the- sleeves” culture • Proactive

Posted 2 months ago

Apply

2 - 4 years

5 - 12 Lacs

Nagpur, Pune, Mumbai (All Areas)

Work from Office

Role & responsibilities Job Overview: We are looking for a highly motivated Junior Data Engineer with a passion for web scraping and web crawling to join our team. The ideal candidate will have strong Python programming skills and experience with web scraping frameworks and libraries like Requests, BeautifulSoup, Selenium, Playwright or URLlib. You will be responsible for building efficient and scalable web scrapers, extracting valuable data, and ensuring data integrity. This role requires a keen eye for problem-solving, the ability to work with complex data structures, and a strong understanding of web technologies like HTML, CSS, DOM, XPATH, and Regular Expressions. Knowledge of JavaScript would be an added advantage. Responsibilities: As a Web Scraper, your role is to apply your knowledge set to fetch data from multiple online sources Developing highly reliable web Scraper and parsers across various websites Extract structured/unstructured data and store them into SQL/No SQL data store Work closely with Project/Business/Research teams to provide scrapped data for analysis Maintain the scraping projects delivered to production Develop frameworks for automating and maintaining constant flow of data from multiple sources Work independently with minimum supervision Develop a deep understanding of the data sources on the web and know exactly how, when, and which data to scrap, parse and store this data Required Skills and Experience: Experience as Web Scraper of 1 to 2 years. Proficient knowledge in Python language and working knowledge of Web Crawling/Web scraping in Python Requests, Beautifulsoup or URLlib and Selenium, Playwright. Must possess strong knowledge of basic Linux commands for system navigation, management, and troubleshooting. Must have expertise in proxy usage to ensure secure and efficient network operations. Must have experience with captcha-solving techniques for seamless automation and data extraction. Experience with data parsing - Strong knowledge of Regular expression, HTML, CSS, DOM, XPATH. Knowledge of Javascript would be a plus Preferred candidate profile Must be able to access, manipulate, and transform data from a variety of database and flat file sources. MongoDB & MYSQL skills are essential. • Must possess strong knowledge of basic Linux commands for system navigation, management, and troubleshooting. • Must be able to develop reusable code-based scraping products which can be used by others. • GIT knowledge is mandatory for version control and collaborative development workflows. • Must have experience handling cloud servers on platforms like AWS, GCP, and LEAPSWITCH for scalable and reliable infrastructure management. • Ability to ask the right questions and deliver the right results in a way that is understandable and usable to your clients. • A track record of digging in to the tough problems, attacking them from different angles, and bringing innovative approaches to bear is highly desirable. Must be capable of selfteaching new techniques. Behavioural expectations: • Be excited by and have positive outlook to navigate ambiguity • Passion for results and excellence • Team player • Must be able to get the job done by working collaboratively with others • Be inquisitive and an analytical mind; out-of-the-box thinking • Prioritize among competing opportunities, balance consumer needs with business and product priorities, and clearly articulate the rationale behind product decisions • Straightforward and professional • Good communicator • Maintain high energy and motivate • A do-it-yourself orientation, consistent with the companys roll-up the- sleeves culture • Proactive

Posted 2 months ago

Apply

3 - 7 years

20 - 25 Lacs

Gurugram

Work from Office

Develop capability to efficiently scrape data from the web from multiple sources. Scrape difficult websites by deploying anti blocking and anti-captcha tools. Knowledge of Mobile request monitoring. Knowledge to use the robots.txt file. Required Candidate profile Proxy rotation and user-agent rotation. Knowledge of request monitoring with different tools like fiddler, Charles Knowledge to set up the virtual device and root the device like MuMu player.

Posted 2 months ago

Apply

2 - 3 years

4 - 5 Lacs

Bengaluru

Work from Office

As a skilled Developer, you are responsible for building tools and applications that utilize the data held within company databases. The primary responsibility will be to design and develop these layers of our applications and to coordinate with the rest of the team working on different layers of IT infrastructure. A commitment to collaborative problem solving, sophisticated design and quality product is essential Python Developer Necessary Skills: Have experience in data wrangling and manipulation with Python/Pandas. Experience with Docker containers. Knowledge of data structures, algorithms and data modeling. Experience with versioning (Git, Azure DevOps). Design and implementation of ETL/ELT pipelines. Should have good knowledge and experience on web scrapping (Scrapy, BeautifulSoup, Selenium) Expertise in at least one popular Python framework (like Django, Flask or Pyramid) Design, build, and maintain efficient, reusable, and reliable Python code. (SOLID, Design principles) Have experience in SQL database (Views, Stored Procedure, etc.) Responsibilities and Activities Aside from the core development role this job position includes auxiliary roles that are not related to development. The role includes but is not limited to: Support and maintenance of customs and previously developed tools, as well as excellence of performance and responsiveness of new applications. Deliver high quality and reliable applications, including Development and Front-End. In addition, you will maintain code quality, prioritize organization, and drive automatization. Participate in the peer review of plans, technical solutions, and related documentation (Map/document technical procedures). Identify security issues, bottlenecks, and bugs, implementing solutions to mitigate and address issues of service data security and data breaches. Work with SQL / Postgres databases: installing and maintaining database systems, supporting server management, including Backups. In addition to troubleshooting issues raised by the Data Processing team.

Posted 2 months ago

Apply

2 - 5 years

5 - 10 Lacs

Ahmedabad

Work from Office

Roles & responsibilities: Develop, and maintain applications using JavaScript. Collaborate with product managers and other developers to translate requirements into functional solutions. Write clean, efficient, and reusable code that adheres to industry best practices and coding standards. Troubleshoot and debug issues reported by clients or internal teams, ensuring timely resolution. Participate in agile development processes, including sprint planning, task estimation, and daily stand-ups. Participating in team building & organizational activities Being a supportive team member and adaptive to company culture Participation towards achieving organizational goal Helping to cherish healthy work culture & AWESOME WORKPLACE Key skills required: Minimum of 2 years of experience in JavaScript Development related work (with any Libraries, with any Frameworks). Proficiency in HTML and CSS. Strong understanding of core JavaScript concepts, including DOM manipulation, asynchronous programming, and event handling Proficient in debugging and troubleshooting skills Understanding of software development principles, including modularization, separation of concerns, and code reusability Excellent problem-solving skills and attention to detail. Ability to work independently as well as part of a team. Good communication skills and ability to collaborate effectively with team members. Must be a fast learner to adapt to changes in technologies. Familiarity with Chrome extension APIs. Good to have experience with building Chrome extensions is a plus. Good to have knowledge with C# development. Perks and benefits: Excellent base salary Flexible working hours 5 Days of Working Work-life balance culture Annual Performance Bonus Company outing Family Health Insurance Lunch, snacks & other benefits

Posted 2 months ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies