Python Developer - Optical Character Recognition

OrangeShark

0 years

0 Lacs

Mumbai Metropolitan Region

Posted:5 months ago| Platform:

Apply

Skills Required

python developer recognition ocr data extraction drive ai design integration tesseract aws textract preprocessing opencv numpy parsing logic enablement collaboration devops monitoring documentation test code compliance automation processing pdf api debugging layout containerization docker git gcp

Work Mode

On-site

Job Type

Full Time

Job Description

Role Overview

As a Python Developer specializing in OCR and data extraction, you will drive endtoend delivery of scalable documentintelligence solutions. Youll partner with crossfunctional teams to architect and ship robust pipelines that transform unstructured files into actionable data, leveraging cuttingedge OCR libraries and generative AI Responsibilities :

Architect & Develop : Design and implement Python applications focused on OCR driven data extraction workflows.
OCR Integration : Integrate and optimize Tesseract, EasyOCR, PaddleOCR and cloudbased services (AWS Textract, Google Vision).
PreProcessing & Enhancement : Apply OpenCV, PIL and NumPy techniques to clean, segment, and enhance images for superior recognition accuracy.
Data Structuring : Build custom parsing logic to normalize, validate and store extracted information in downstream systems.
LLM & Gen AI Enablement : Incorporate Large Language Models (e.g., GPTbased, LayoutLM, Donut) to enrich document understanding, classification, and entity extraction.
Collaboration & Delivery : Liaise with backend, data science, and DevOps teams to ensure seamless CI/CD integration, monitoring, and support.
Quality & Documentation : Write modular, test driven code; maintain clear documentation; and conduct peer reviews to uphold code quality and compliance Have Skills :
Expert proficiency in Python and core automation/data processing libraries (NumPy, Pandas).
Handson experience with OCR libraries (Tesseract, EasyOCR, PaddleOCR) and handling PDF/image parsing.
Solid understanding of RESTful API integration and largescale document workflows.
Proficient in image preprocessing using OpenCV and PIL to maximize OCR accuracy.
Strong analytical, debugging, and problem solving capabilities in production to Have :
Proven track record deploying LLM driven document intelligence solutions (e.g., GPT, LLaMA) or Gen AI frameworks.
Worked with Langchain and Vector DB
Familiarity with AI document models such as LayoutLM, Donut, TrOCR for advanced layout parsing.
Exposure to NLP tasks : document classification, entity recognition, semantic search.
Experience with containerization (Docker), version control (Git), and cloud platforms (AWS/GCP/Azure).

(ref:hirist.tech)

More Jobs at OrangeShark

Senior Data Scientist

Hyderabad, Telangana, India

Experience: Not specified

Salary: Not disclosed

Client Relationship Manager | Luxury Real Estate

Bengaluru, Karnataka, India

2.0 - 2.0 yrs

Salary: Not disclosed

OpenShift Engineer

Noida, Uttar Pradesh, India

6.0 - 6.0 yrs

Salary: Not disclosed

L1 DevOps Support Engineer

Navi Mumbai, Maharashtra, India

2.0 - 2.0 yrs

Salary: Not disclosed

Data Scientist - Python/R

Itanagar, Arunachal Pradesh, India

5.0 - 5.0 yrs

Salary: Not disclosed

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.