Location: Remote / Hybrid (India-based, working with global data) Duration: 3–6 months Department: Data Operations & Research Reports To: Data Operations Manager About Kuinbee: Role Overview As an International Data Collection Intern, you will play a key role in sourcing, extracting, and structuring datasets from global sources. You will work with a combination of manual research and automated Python-based tools such as BeautifulSoup, Selenium, and Playwright to gather accurate and well-structured data from websites, APIs, and databases. Key Responsibilities Web Scraping & Automation – Use Python (BeautifulSoup, Selenium, Playwright, Requests) to extract data, structure, and clean it (CSV, JSON, XLSX). API & Data Sourcing – Integrate with APIs and gather datasets from government portals, research databases, industry reports, and open-data repositories. Data Processing & Quality – Apply Pandas/NumPy for cleaning, validation, and ensuring accuracy, completeness, legality, and compliance with privacy laws (GDPR, CCPA). Documentation & Compliance – Maintain replicable scripts, source lists, methodologies, and adhere to licensing, ethical, and legal standards. Collaboration & Communication – Work with analytics, compliance, and product teams; communicate effectively in English. Preferred Skills – Knowledge of Git/GitHub, cloud platforms (AWS/GCP/Azure), Excel/Google Sheets, multilingual abilities, and an academic background in Data Science/Computer Science. What You’ll Gain Real-world experience in large-scale, international data collection projects. Exposure to high-demand Python scraping and automation tools. Hands-on experience in sectors like finance, economics, energy, environment, and agriculture. Mentorship from experienced data engineers and analysts. Possibility of full-time placement based on performance.
Company Description Kuinbee is a data-on-demand platform revolutionizing how individuals and businesses access, share, and monetize data. We empower researchers, analysts, entrepreneurs, and students with ready-to-use datasets and customized data solutions at their fingertips. Kuinbee is building a global community of data professionals, providing instant access to structured datasets and custom data collection services. Based in India and trusted globally, we combine generation, regulation, and validation under one roof to democratize data access and foster collaboration. Role Description This is an internship role for an International Data Collection Intern. The intern will be responsible for collecting, organizing, and analyzing data from various sources, ensuring data accuracy, and presenting findings in a clear and concise manner. The intern will work closely with the data team to support custom data collection services and community-driven data contributions. This is a remote role. Qualifications Strong Analytical Skills Excellent Communication skills Basic understanding of Finance principles Strong problem-solving abilities and attention to detail Ability to work independently and remotely Relevant coursework or experience in data collection or analysis is a plus Currently pursuing or completed a degree in a related field (e.g., Data Science, Business, Economics) Stipend: Unpaid
Company: Kuinbee Location: Pune, Maharashtra Mode: Hybrid Role Type: Full-Time About Kuinbee Kuinbee is building a unified data ecosystem that combines a scalable data marketplace with an end-to-end AI-driven pipeline. Our platform enables automated ingestion, transformation, quality checks, lineage tracking, modelling, and metadata intelligence allowing organisations to integrate, manage, and operationalise their data with minimal engineering effort. By merging marketplace accessibility with intelligent automation, Kuinbee aims to redefine how modern data systems are built, governed, and scaled. Role Overview The ideal candidate will have deep knowledge of end-to-end data workflows, strong architectural thinking, and the ability to translate engineering processes into modular, automated agents. You will work closely with the product and AI teams to formalize the logic that powers Kuinbee’s data automation platform. Key Responsibilities • Document complete pipeline flows from source to serving, including raw, clean, transformed, and model-ready stages. • Identify technical pain points in real-world pipelines, including failure modes, schema drift, refresh inconsistencies, and orchestration issues. • Demonstrate how heterogeneous sources such as databases, APIs, files, and streams are combined, validated, modelled, and monitored. • Present two to three real pipelines you have built, including architecture diagrams, decisions, and recovery strategies. • Collaborate with AI engineers to design agent equivalents for schema mapping, data cleaning, transformations, validation, and lineage.• Define metadata requirements for Kuinbee’s Supermemory Layer to support governance, semantic consistency, and automated monitoring. Core Requirements • 5+ years of experience building and maintaining production data pipelines end to end. • Expertise with relational databases such as Postgres, MySQL, or SQL Server. • Experience with data warehouses including BigQuery, Snowflake, or Redshift. • Familiarity with processing files such as Parquet, CSV, and Excel, along with API-based and streaming data. • Advanced skills in SQL, Python, and modern transformation frameworks such as dbt. • Hands-on experience with Spark, Dask, or other distributed compute engines. • Experience with data quality and observability tools such as Great Expectations, Soda, or Deequ. • Knowledge of lineage systems such as OpenLineage, DataHub, or OpenMetadata. • Strong data modelling foundation including star schemas, semantic layers, metrics, and feature preparation. • Experience with orchestration frameworks such as Airflow, Dagster, or Prefect. • Understanding of performance optimization including partitioning, indexing, clustering, and query planning. • Exposure to integrated machine learning workflows such as feature engineering and inference paths. • Ability to design, reason about, and evaluate modern data architecture. Bonus Skills Hands-on experience with LLM-powered workflows, agentic automation, or AI-driven data transformation. Background in architecting internal data platforms, analytical backbones, or end-to-end data infra. Familiarity with domain-driven data modeling principles and modular data product design. Strong point of view on governance frameworks, metadata standards, lineage, and observability. Palantir Foundry certification or direct experience with Foundry-style ontology, pipelines, and operational workflows. Compensation: Paid (Contract Based) How to Apply Send your CV or portfolio to arzaaan.hr@kuinbee.com. Applicants who include examples of real pipelines or architecture documents will receive priority consideration.
Company: Kuinbee Location: Pune, Maharashtra Mode: Hybrid Role Type: Full-Time About Kuinbee Kuinbee is building a unified data ecosystem that combines a scalable data marketplace with an end-to-end AI-driven pipeline. Our platform enables automated ingestion, transformation, quality checks, lineage tracking, modelling, and metadata intelligence allowing organisations to integrate, manage, and operationalise their data with minimal engineering effort. By merging marketplace accessibility with intelligent automation, Kuinbee aims to redefine how modern data systems are built, governed, and scaled. Role Overview The ideal candidate will have deep knowledge of end-to-end data workflows, strong architectural thinking, and the ability to translate engineering processes into modular, automated agents. You will work closely with the product and AI teams to formalize the logic that powers Kuinbee's data automation platform. Key Responsibilities Document complete pipeline flows from source to serving, including raw, clean, transformed, and model-ready stages. Identify technical pain points in real-world pipelines, including failure modes, schema drift, refresh inconsistencies, and orchestration issues. Demonstrate how heterogeneous sources such as databases, APIs, files, and streams are combined, validated, modelled, and monitored. Present two to three real pipelines you have built, including architecture diagrams, decisions, and recovery strategies. Collaborate with AI engineers to design agent equivalents for schema mapping, data cleaning, transformations, validation, and lineage. Define metadata requirements for Kuinbee's Supermemory Layer to support governance, semantic consistency, and automated monitoring. Core Requirements 5+ years of experience building and maintaining production data pipelines end to end. Expertise with relational databases such as Postgres, MySQL, or SQL Server. Experience with data warehouses including BigQuery, Snowflake, or Redshift. Familiarity with processing files such as Parquet, CSV, and Excel, along with API-based and streaming data. Advanced skills in SQL, Python, and modern transformation frameworks such as dbt. Hands-on experience with Spark, Dask, or other distributed compute engines. Experience with data quality and observability tools such as Great Expectations, Soda, or Deequ. Knowledge of lineage systems such as OpenLineage, DataHub, or OpenMetadata. Strong data modelling foundation including star schemas, semantic layers, metrics, and feature preparation. Experience with orchestration frameworks such as Airflow, Dagster, or Prefect. Understanding of performance optimization including partitioning, indexing, clustering, and query planning. Exposure to integrated machine learning workflows such as feature engineering and inference paths. Ability to design, reason about, and evaluate modern data architecture. Bonus Skills Hands-on experience with LLM-powered workflows, agentic automation, or AI-driven data transformation. Background in architecting internal data platforms, analytical backbones, or end-to-end data infra. Familiarity with domain-driven data modeling principles and modular data product design. Strong point of view on governance frameworks, metadata standards, lineage, and observability. Palantir Foundry certification or direct experience with Foundry-style ontology, pipelines, and operational workflows. Compensation: Paid (Contract Based) How to Apply Send your CV or portfolio to [HIDDEN TEXT]. Applicants who include examples of real pipelines or architecture documents will receive priority consideration.