Posted:1 week ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Job Summary As a Senior Data Scientist specializing in NLP, Generative AI, and Cloud technologies, you will be responsible for driving the development of data extraction solutions from documents at scale. This role requires advanced technical expertise in machine learning, NLP, and cloud computing, with a focus on automating document understanding processes and enhancing the quality of data extraction through state-of-the-art techniques. You will lead the design, implementation, and deployment of scalable NLP and AI models, mentor junior data scientists, and work collaboratively with cross-functional teams to deliver innovative solutions. This is a strategic role that requires both deep technical knowledge and leadership capabilities to shape the future of document data extraction within the organization. Key Responsibilities Lead Data Extraction Solutions: Design, implement, and scale advanced NLP and machine learning models for automating the extraction of structured data from a wide range of unstructured documents (e.g., PDFs, scanned images, contracts, reports, etc.). Generative AI Expertise: Leverage Generative AI models (such as GPT, BERT, and related architectures) for tasks such as document summarization, content generation, and enhancing extracted data. Cloud-Based Deployment: Architect and deploy data extraction models and workflows in cloud environments (AWS, Azure, GCP), ensuring scalability, reliability, and cost-efficiency. Model Development & Optimization: Develop and fine-tune machine learning and NLP models, ensuring high performance in accuracy, efficiency, and robustness for real-world data extraction tasks. Data Pipeline Design: Build and optimize end-to-end data pipelines, including data preprocessing, feature engineering, and model deployment, to process large-scale document datasets in the cloud. Cross-Functional Collaboration: Work closely with product, engineering, and business teams to understand requirements, provide technical solutions, and deliver impactful data-driven results. Research & Innovation: Stay up-to-date with the latest advancements in NLP, machine learning, and AI, applying cutting-edge research to improve data extraction methodologies. Mentorship & Leadership: Lead and mentor a team of junior data scientists, providing guidance on best practices, model development, and cloud deployment. Model Monitoring & Maintenance: Establish systems for monitoring model performance in production and ensure models are maintained and updated based on new data or changing requirements. Compliance & Security: Ensure data processing and extraction workflows adhere to industry standards, data privacy regulations, and security protocols, particularly when working with sensitive information. Required Skills & Qualifications Experience: Minimum 8 years of experience as a Data Scientist or similar role, with a focus on NLP, machine learning, and AI. At least 3 years in a senior or lead capacity. NLP & Document Processing Expertise: Proven experience applying NLP techniques such as Named Entity Recognition (NER), Optical Character Recognition (OCR), information extraction, document classification, and semantic analysis for data extraction from unstructured text. Generative AI: Advanced knowledge of Generative AI models (e.g., GPT-3, BERT, T5) and experience applying them to real-world document and text processing tasks. Cloud Technologies: Extensive experience with cloud platforms (AWS, Azure, or GCP) for deploying data pipelines, managing machine learning models, and processing large datasets. Programming Skills: Proficiency in Python and libraries such as SpaCy, Hugging Face Transformers, TensorFlow, PyTorch, and scikit-learn. Data Pipeline & DevOps Tools: Hands-on experience with building, optimizing, and deploying data pipelines in cloud environments, including tools like Docker, Kubernetes, Apache Airflow, and MLFlow. Data Handling & Analysis: Expertise in data manipulation and analysis using tools such as Pandas, NumPy, and SQL, and ability to work with large datasets. Leadership & Communication: Strong leadership and mentoring abilities, with excellent written and verbal communication skills to explain complex technical concepts to non-technical stakeholders. Problem Solving: Exceptional problem-solving skills with a creative approach to tackling challenges related to document data extraction. Collaboration: Experience working in a collaborative, cross-functional team environment to deliver end-to-end solutions. Preferred Qualifications Advanced Degree: Master’s or PhD in Computer Science, Data Science, Artificial Intelligence, or a related field. Advanced NLP Techniques: Experience with state-of-the-art NLP methods such as transfer learning, attention mechanisms, and reinforcement learning applied to document data extraction. Compliance Experience: Familiarity with legal, financial, or healthcare industry regulations regarding data privacy and document processing. Industry Experience: Previous experience in industries such as finance, legal, healthcare, or other sectors that heavily rely on document data extraction. Show more Show less

Mock Interview

Practice Video Interview with JobPe AI

Start Data Interview Now

My Connections EXL

Download Chrome Extension (See your connection in the EXL )

chrome image
Download Now
EXL
EXL

Business Process Management / Analytics

New York

20,000+ Employees

1140 Jobs

    Key People

  • Rohit Kapoor

    Vice Chairman & CEO
  • Jasvinder Singh

    President

RecommendedJobs for You

Noida, New Delhi

Gurugram, Haryana, India

Gurgaon, Haryana, India