Jobs
Interviews

5 Pymupdf Jobs

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

5.0 - 7.0 years

0 Lacs

Hyderabad, Telangana, India

On-site

Job Title: Senior Python Developer AI/ML Document Automation Location: Hyderabad Work Mode: Hybrid Experience: 5+ Years Job Summary: We are looking for a highly skilled Senior Python Developer with deep expertise in AI/ML and document automation . The ideal candidate will lead the design and development of intelligent systems for extracting and processing structured and unstructured data from documents such as invoices, receipts, contracts, and PDFs. This role involves both hands-on coding and architectural contributions to scalable automation platforms. Roles and Responsibilities: Design and develop modular Python applications for document parsing and intelligent automation. Build and optimize ML/NLP pipelines for tasks like Named Entity Recognition (NER), classification, and layout-aware data extraction. Integrate rule-based and AI-driven techniques (e.g., regex, spaCy, PyMuPDF, Tesseract) to handle diverse document formats. Develop and deploy models via REST APIs using FastAPI or Flask, and containerize with Docker. Collaborate with cross-functional teams to define automation goals and data strategies. Conduct code reviews, mentor junior developers, and uphold best coding practices. Monitor model performance and implement feedback mechanisms for continuous improvement. Maintain thorough documentation of workflows, metrics, and architectural decisions. Mandatory Skills: Expert in Python (OOP, asynchronous programming, modular design). Strong foundation in machine learning algorithms and natural language processing techniques. Hands-on experience with Scikit-learn, TensorFlow, PyTorch, and Hugging Face Transformers. Proficient in developing REST APIs using FastAPI or Flask. Experience in PDF/text extraction using PyMuPDF, Tesseract, or similar tools. Skilled in regex-based extraction and rule-based NER. Familiar with Git, Docker, and any major cloud platform (AWS, GCP, or Azure). Exposure to MLOps tools such as MLflow, Airflow, or LangChain. Show more Show less

Posted 3 days ago

Apply

3.0 - 5.0 years

4 - 9 Lacs

Pune

Work from Office

Role & responsibilities Design, prototype, and deploy AI-driven applications leveraging LLMs (GPT-4, Perplexity, Claude, Gemini, etc.) and open-source transformer models. Lead or co-lead end-to-end AI/GenAI solutions : from data ingestion, entity extraction, and semantic search to user-facing interfaces. Implement RAG (Retrieval-Augmented Generation) architectures, knowledge grounding pipelines, and prompt orchestration logic. Fine-tune transformer models (BERT, RoBERTa, T5, LLaMA) on custom datasets for use cases like: Document understanding Conversational AI Question answering Summarization & Topic Modeling Integrate LLM workflows into scalable backend architectures with APIs and frontends. Work closely with business teams and pharma SMEs to translate requirements into GenAI solutions . Mentor junior engineers and contribute to AI capability development across the organization. Tech Stack & Skills Required Programming & Libraries : Python, FastAPI, LangChain, Pandas, PyTorch/TensorFlow, Transformers (HuggingFace), OpenAI SDK. Data Extraction & Processing : PDFMiner, PyMuPDF, Tabula, PyPDF2, Tesseract OCR, python-pptx. Gen AI / LLMs : OpenAI (GPT), Gemini, Perplexity, Cohere, DeepSeek, Mistral, LLaMA, BERT, RoBERTa, T5, Falcon. Use Cases : NER, Summarization, QA, Document Parsing, Clustering, Topic Modeling, QA over docs. Embedding & Vector Databases : Pinecone, FAISS, ChromaDB. RAG & Retrieval Pipelines : LangChain, Haystack, custom retrievers. Frontend/Backend Integration : React (preferred), FastAPI/Flask, REST/GraphQL APIs. Versioning & Deployment : Git, Docker, CI/CD (basic), cloud knowledge is a plus (AWS/GCP/Azure). Preferred candidate profile Degree in Computer Science, Engineering, Data Science, or related field (BE/BTech/MTech/MCA). 35 years of hands-on experience in AI/ML/NLP/LLM solution development. Strong understanding of GenAI, Prompt Engineering, LLM internals , and multi-layered data architectures. Exposure to pharma/healthcare domain is a significant plus. Excellent problem-solving skills, self-learner, and ability to work in cross-functonal teams. Experience developing new applications within an agile environment preferred. Ability to work independently and as part of a team. Exposure to MLOPs will be an added advantage.

Posted 1 week ago

Apply

0.0 years

0 Lacs

Bengaluru, Karnataka, India

On-site

Ready to shape the future of work At Genpact, we don&rsquot just adapt to change&mdashwe drive it. AI and digital innovation are redefining industries, and we&rsquore leading the charge. Genpact&rsquos , our industry-first accelerator, is an example of how we&rsquore scaling advanced technology solutions to help global enterprises work smarter, grow faster, and transform at scale. From large-scale models to , our breakthrough solutions tackle companies most complex challenges. If you thrive in a fast-moving, tech-driven environment, love solving real-world problems, and want to be part of a team that&rsquos shaping the future, this is your moment. Genpact (NYSE: G) is an advanced technology services and solutions company that delivers lasting value for leading enterprises globally. Through our deep business knowledge, operational excellence, and cutting-edge solutions - we help companies across industries get ahead and stay ahead. Powered by curiosity, courage, and innovation , our teams implement data, technology, and AI to create tomorrow, today. Get to know us at and on , , , and . Inviting applications for the role of Senior Principal Consultant- Generative AI - Application development Senior Developer We are looking for a Senior Application Developer to join our product engineering team. This role requires hands-on experience in designing and developing scalable application components with a strong focus on API development, middleware orchestration, and data transformation workflows. You will be responsible for building foundational components that integrate data pipelines, orchestration layers, and user interfaces, enabling next-gen digital and AI-powered experiences. Key Responsibilities: Design, develop, and manage robust APIs and middleware services using Python frameworks like FastAPI and Uvicorn , ensuring scalable and secure access to platform capabilities. Develop end-to-end data transformation workflows and pipelines using LangChain , spacy , tiktoken , presidio-analyzer, and llm -guard, enabling intelligent content and data processing. Implement integration layers and orchestration logic for seamless communication between data sources, services, and UI using technologies like OpenSearch, boto3, requests-aws4auth, and urllib3. Work closely with UI/UX teams to integrate APIs into modern front-end frameworks such as ReactJS, Redux Toolkit, and Material UI. Build configurable modules for ingestion, processing, and output using Python libraries like PyMuPDF , openpyxl , and Unidecode for handling structured and unstructured data. Implement best practices for API security, data privacy, and anonymization using tools like presidio-anonymizer and llm -guard. Drive continuous improvement in performance, scalability, and reliability of the platform architecture. Qualifications we seek in you: Minimum Qualifications Experience in software development in enterprise/ web applications Languages & Frameworks: Python, JavaScript/TypeScript, FastAPI , ReactJS, Redux Toolkit Libraries & Tools: langchain , presidio-analyzer, PyMuPDF , spacy, rake- nltk , inflection, openpyxl , tiktoken APIs & Integration: FastAPI , requests, urllib3, boto3, opensearch-py , requests-aws4auth UI/UX: ReactJS, Material UI, LESS Cloud & DevOps: AWS SDKs, API gateways, logging, and monitoring frameworks (optional experience with serverless is a plus) Preferred Qualifications: Strong understanding of API lifecycle management, REST principles, and microservices. Experience in data transformation, document processing, and middleware architecture. Exposure to AI/ML or Generative AI workflows using LangChain or OpenAI APIs. Prior experience working on secure and compliant systems involving user data. Experience in CI/CD pipelines, containerization (Docker), and cloud-native deployments (AWS preferred). Why join Genpact Be a transformation leader - Work at the cutting edge of AI, automation, and digital innovation Make an impact - Drive change for global enterprises and solve business challenges that matter Accelerate your career - Get hands-on experience, mentorship, and continuous learning opportunities Work with the best - Join 140,000+ bold thinkers and problem-solvers who push boundaries every day Thrive in a values-driven culture - Our courage, curiosity, and incisiveness - built on a foundation of integrity and inclusion - allow your ideas to fuel progress Come join the tech shapers and growth makers at Genpact and take your career in the only direction that matters: Up. Let&rsquos build tomorrow together. Genpact is an Equal Opportunity Employer and considers applicants for all positions without regard to race, color , religion or belief, sex, age, national origin, citizenship status, marital status, military/veteran status, genetic information, sexual orientation, gender identity, physical or mental disability or any other characteristic protected by applicable laws. Genpact is committed to creating a dynamic work environment that values respect and integrity, customer focus, and innovation. Furthermore, please do note that Genpact does not charge fees to process job applications and applicants are not required to pay to participate in our hiring process in any other way. Examples of such scams include purchasing a %27starter kit,%27 paying to apply, or purchasing equipment or training.

Posted 2 weeks ago

Apply

6.0 - 10.0 years

20 - 35 Lacs

Bengaluru

Work from Office

Role : Data Scientist Experience: 610 Years Salary : 35 L Upto Location : Bangalore Notice Period: 30 Days max ( Immediate Joiners Preferred ) Note : We are Open for both Contract or FTE Job Description: We are seeking a Data Scientist with strong experience in natural language processing and generative AI to build intelligent systems that extract information from PDFs, power chatbots, and implement retrieval-augmented generation (RAG) solutions. The ideal candidate should have a solid foundation in machine learning, experience working with unstructured text data. Key Responsibilities: Design and implement pipelines for extracting structured data from PDF documents using tools like PyMuPDF, PDFPlumber, or OCR libraries. Develop and fine-tune RAG pipelines combining vector databases (e.g., FAISS, Chroma) with LLMs for document-based question answering. Build and optimize conversational agents (chatbots) that use domain-specific data and generative AI models. Train and evaluate machine learning and NLP models for classification, entity extraction, summarization, etc. Collaborate with engineering and product teams to integrate AI capabilities into production systems. Monitor model performance and continually improve accuracy and relevance of generated responses. Requirements: Strong Python skills and experience with libraries like Hugging Face Transformers, LangChain, or LlamaIndex. Strong knowledge of ABC cloud tools (AWS or Azure) Familiarity with vector databases and embedding models (OpenAI, Cohere, SentenceTransformers). Experience with PDF data extraction techniques and tools. Solid understanding of supervised/unsupervised ML, deep learning, and NLP techniques. Experience working with LLMs and knowledge of prompt engineering and fine-tuning. Bachelors or Masters degree in Computer Science, Data Science, or a related field.

Posted 1 month ago

Apply

4.0 - 8.0 years

4 - 8 Lacs

Chennai, Tamil Nadu, India

On-site

Primary Responsibilities: Design and develop AI-driven web applications using Streamlit and LangChain. Implement multi-agent workflows with LangGraph. Integrate Claude 3 (via AWS Bedrock) into intelligent systems for document and image processing. Work with FAISS for vector search and similarity matching. Develop document integration solutions for PDF, DOCX, XLSX, PPTX, and image-based formats. Implement OCR and summarization features using EasyOCR, PyMuPDF, and AI models. Create features such as spell-check, chatbot accuracy tracking, and automatic re-training pipelines. Build secure apps with SSO authentication, transcript downloads, and reference link generation. Integrate external platforms like Confluence, SharePoint, ServiceNow, Veeva Vault, Outlook, G.Net/G.Share, and JIRA. Collaborate on architecture, performance optimization, and deployment. Required Skills: Strong expertise in Streamlit, LangChain, LangGraph, and Claude 3 (AWS Bedrock). Hands-on experience with boto3, FAISS, EasyOCR, and PyMuPDF. Advanced skills in document parsing and image/video-to-text summarization. Proficient in modular architecture design and real-time AI response systems. Experience in enterprise integration with tools like ServiceNow, Confluence, Outlook, and JIRA. Familiar with chatbot monitoring and retraining strategies. Secondary Skills: Working knowledge of PostgreSQL, JSON, and file I/O with Python libraries like os, io, time, datetime, and typing. Experience with dataclasses and numpy for efficient data handling and numerical process

Posted 1 month ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies