3 years
0 Lacs
Posted:3 days ago|
Platform:
On-site
Full Time
We are seeking a skilled and detail-oriented Python Developer with hands-on experience in web scraping, document parsing (PDF, HTML, XML), and structured data extraction. You will be part of a core team working on aggregating biomedical content from diverse sources, including grant repositories, scientific journals, conference abstracts, treatment guidelines, and clinical trial databases.
• Develop scalable Python scripts to scrape and parse biomedical data from websites, pre-print servers, citation indexes, journals, and treatment guidelines.
• Build robust modules for splitting multi-record documents (PDFs, HTML, etc.) into individual content units.
• Implement NLP-based field extraction pipelines using libraries like spaCy, NLTK, or regex for metadata tagging.
• Design and automate workflows using schedulers like cron, Celery, or Apache Airflow for periodic scraping and updates.
• Store parsed data in relational (PostgreSQL) or NoSQL (MongoDB) databases with efficient schema design.
• Ensure robust logging, exception handling, and content quality validation across all processes.
• 3+ years of hands-on experience in Python, especially for data extraction, transformation, and loading (ETL).
o Strong command over web scraping libraries:
BeautifulSoup, Scrapy, Selenium, Playwright
o Proficiency in PDF parsing libraries:
PyMuPDF, pdfminer.six, PDFPlumber
• Experience with HTML/XML parsers: lxml, XPath, html5lib
• Familiarity with regular expressions, NLP, and field extraction techniques.
• Working knowledge of SQL and/or NoSQL databases (MySQL, PostgreSQL, MongoDB).
• Understanding of API integration (RESTful APIs) for structured data sources.
• Experience with task schedulers and workflow orchestrators (cron, Airflow, Celery).
• Version control using Git/GitHub and comfortable working in collaborative environments.
• Exposure to biomedical or healthcare data parsing (e.g., abstracts, clinical trials, drug labels).
• Familiarity with cloud environments like AWS (Lambda, S3)
• Experience with data validation frameworks and building QA rules.
• Understanding of ontologies and taxonomies (e.g., UMLS, MeSH) for content tagging.
• Opportunity to work on cutting-edge biomedical data aggregation for large-scale AI and knowledge graph initiatives.
• Collaborative environment with a mission to improve access and insights from scientific literature.
• Flexible work arrangements and access to industry-grade tools and infrastructure.
eMedEvents - Global Marketplace for CME/CE
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Practice Python coding challenges to boost your skills
Start Practicing Python Now2.0 - 3.0 Lacs P.A.
Chennai
25.0 - 27.5 Lacs P.A.
Nashik, Pune
3.0 - 7.0 Lacs P.A.
Visakhapatnam
2.0 - 6.0 Lacs P.A.
Bengaluru
Experience: Not specified
8.0 - 12.0 Lacs P.A.
Bengaluru
5.0 - 15.0 Lacs P.A.
Hyderabad, Telangana, India
Salary: Not disclosed
bangalore, pune
0.00024 - 0.00028 Lacs P.A.
Mohali
Salary: Not disclosed
Hyderabad, Telangana, India
Salary: Not disclosed