Posted:11 hours ago|
Platform:
On-site
Full Time
Key Responsibilities Build robust document data extraction pipelines using NLP and OCR techniques Develop and optimize end-to-end workflows for parsing scanned/image-based documents (PDFs, JPGs, TIFFs) and structured files (MS Excel, MS Word). Leverage LLM models (OpenAI GPT, Claude, Gemini etc.) for advanced entity extraction, summarization, and classification tasks. Design and implement Python-based scripts for parsing, cleaning, and transforming data. Integrate with Azure Services for document storage, compute, and secured API hosting (e.g., Azure Blob, Azure Functions, Key Vault, Azure Cognitive Services). Deploy and orchestrate workflows in Azure Databricks (including Spark and ML pipelines). Build and manage API calls for model integration, rate-limiting, and token control using AI gateways. Automate results export into SQL/Oracle databases and enable downstream access for analytics/reporting. Handle diverse metadata requirements, and create reusable, modular code for different document types. Optionally visualize and report data using Power BI and export data into Excel for stakeholder review. Technical Skills Required Skills & Qualifications: Strong programming skills in Python (Pandas, Regex, Pytesseract, spaCy, LangChain, Transformers, etc.) Experience with Azure Cloud (Blob Storage, Function Apps, Key Vaults, Logic Apps) Hands-on with Azure Databricks (PySpark, Delta Lake, MLFlow) Familiarity with OCR tools like Tesseract, Azure OCR, AWS textract, or Google Vision API Proficient in SQL and experience with Oracle Database integration (using cx_Oracle, SQLAlchemy, etc.) Experience working with LLM APIs (OpenAI, Anthropic, Google, or Hugging Face models) Knowledge of API development and integration (REST, JSON, API rate limits, authentication handling) Excel data manipulation using Python (e.g., openpyxl, pandas, xlrd) Understanding of Power BI dashboards and integration with structured data sources Nice To Have Experience with LangChain, LlamaIndex, or similar frameworks for document Q&A and retrieval-augmented generation (RAG) Background in data science or machine learning CI/CD and version control (Git, Azure DevOps) Familiarity with Data Governance and PII handling in document processing Soft Skills Strong problem-solving skills and an analytical mindset Attention to detail and ability to work with messy/unstructured data Excellent communication skills to interact with technical and non-technical stakeholders Ability to work independently and manage priorities in a fast-paced environment Show more Show less
EXL
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Mumbai
4.0 - 5.0 Lacs P.A.
Kanpur Dehat, Kanpur Nagar
Experience: Not specified
0.7 - 1.5 Lacs P.A.
Chennai
8.0 - 13.0 Lacs P.A.
5.0 - 8.0 Lacs P.A.
3.0 - 6.0 Lacs P.A.
10.0 - 14.0 Lacs P.A.
10.0 - 15.0 Lacs P.A.
Chennai, Tamil Nadu, India
Experience: Not specified
Salary: Not disclosed
Chennai, Tamil Nadu, India
Experience: Not specified
Salary: Not disclosed
Chennai, Tamil Nadu, India
Salary: Not disclosed