Posted:3 days ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Python Developer – Web Scraping & Data Processing


About the Role


We are seeking a skilled and detail-oriented Python Developer with hands-on experience in web scraping, document parsing (PDF, HTML, XML), and structured data extraction. You will be part of a core team working on aggregating biomedical content from diverse sources, including grant repositories, scientific journals, conference abstracts, treatment guidelines, and clinical trial databases.


Key Responsibilities

• Develop scalable Python scripts to scrape and parse biomedical data from websites, pre-print servers, citation indexes, journals, and treatment guidelines.

• Build robust modules for splitting multi-record documents (PDFs, HTML, etc.) into individual content units.

• Implement NLP-based field extraction pipelines using libraries like spaCy, NLTK, or regex for metadata tagging.

• Design and automate workflows using schedulers like cron, Celery, or Apache Airflow for periodic scraping and updates.

• Store parsed data in relational (PostgreSQL) or NoSQL (MongoDB) databases with efficient schema design.

• Ensure robust logging, exception handling, and content quality validation across all processes.


Required Skills and Qualifications

• 3+ years of hands-on experience in Python, especially for data extraction, transformation, and loading (ETL).

o Strong command over web scraping libraries:

BeautifulSoup, Scrapy, Selenium, Playwright


o Proficiency in PDF parsing libraries:

PyMuPDF, pdfminer.six, PDFPlumber


• Experience with HTML/XML parsers: lxml, XPath, html5lib

• Familiarity with regular expressions, NLP, and field extraction techniques.

• Working knowledge of SQL and/or NoSQL databases (MySQL, PostgreSQL, MongoDB).

• Understanding of API integration (RESTful APIs) for structured data sources.

• Experience with task schedulers and workflow orchestrators (cron, Airflow, Celery).

• Version control using Git/GitHub and comfortable working in collaborative environments.


Good to Have

• Exposure to biomedical or healthcare data parsing (e.g., abstracts, clinical trials, drug labels).

• Familiarity with cloud environments like AWS (Lambda, S3)

• Experience with data validation frameworks and building QA rules.

• Understanding of ontologies and taxonomies (e.g., UMLS, MeSH) for content tagging.


Why Join Us

• Opportunity to work on cutting-edge biomedical data aggregation for large-scale AI and knowledge graph initiatives.

• Collaborative environment with a mission to improve access and insights from scientific literature.

• Flexible work arrangements and access to industry-grade tools and infrastructure.



Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You

Hyderabad, Telangana, India

Hyderabad, Telangana, India