Jobs
Interviews

3 Open Metadata Jobs

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

9.0 - 13.0 years

0 Lacs

haryana

On-site

About Markovate: At Markovate, you don't just follow trends, we drive them. We transform businesses through innovative AI and digital solutions that turn vision into reality. Our team harnesses breakthrough technologies to craft bespoke strategies that align seamlessly with our clients" ambitions. From AI Consulting And Gen AI Development To Pioneering AI Agents And Agentic AI, We Empower Our Partners To Lead Their Industries With Forward-thinking Precision And Unmatched Overview. We are seeking a highly experienced and innovative Senior Data Engineer with a strong background in hybrid cloud data integration, pipeline orchestration, and AI-driven data modelling. Requirements: - 9+ years of experience in data engineering and data architecture. - Excellent communication and interpersonal skills, with the ability to engage with teams. - Strong problem-solving, decision-making, and conflict-resolution abilities. - Proven ability to work independently and lead cross-functional teams. - Ability to work in a fast-paced, dynamic environment and handle sensitive issues with discretion and professionalism. - Ability to maintain confidentiality and handle sensitive information with attention to detail with discretion. - The candidate must have strong work ethics and trustworthiness. - Must be highly collaborative and team-oriented with a commitment to Responsibilities. Responsibilities: - Design and develop hybrid ETL/ELT pipelines using AWS Glue and Azure Data Factory (ADF). - Process files from AWS S3 and Azure Data Lake Gen2, including schema validation and data profiling. - Implement event-based orchestration using AWS Step Functions and Apache Airflow (Astronomer). - Develop and maintain bronze, silver, gold data layers using DBT or Coalesce. - Create scalable ingestion workflows using Airbyte, AWS Transfer Family, and Rivery. - Integrate with metadata and lineage tools like Unity Catalog and Open Metadata. - Build reusable components for schema enforcement, EDA, and alerting (e.g., MS Teams). - Work closely with QA teams to integrate test automation and ensure data quality. - Collaborate with cross-functional teams including data scientists and business stakeholders to align solutions with AI/ML use cases. - Document architectures, pipelines, and workflows for internal stakeholders. Experience with: - Cloud platforms such as AWS (Glue, Step Functions, Lambda, S3, CloudWatch, SNS, Transfer Family) and Azure (ADF, ADLS Gen2, Azure Functions, Event Grid). - Transformation and ELT tools like Databricks (PySpark), DBT, Coalesce, and Python. - Data ingestion methods including Airbyte, Rivery, SFTP/Excel files, and SQL Server extracts. - Data modeling techniques including CEDM, Data Vault 2.0, and Dimensional Modelling. - Orchestration tools such as AWS Step Functions, Airflow (Astronomer), and ADF Triggers. - Monitoring and logging tools like CloudWatch, AWS Glue Metrics, MS Teams Alerts, and Azure Data Explorer (ADX). - Data governance and lineage tools: Unity Catalog, OpenMetadata, and schema drift detection. - Version control and CI/CD using GitHub, Azure DevOps, CloudFormation, Terraform, and ARM templates. - Cloud data platforms, ETL tools, AI/Generative AI concepts and frameworks, data warehousing solutions, big data technologies, SQL, and at least one programming language. Great to have: - Experience with cloud data platforms (e.g., AWS, Azure, GCP) and their data and AI services. - Knowledge of ETL tools and frameworks (e.g., Apache NiFi, Talend, Informatica). - Deep understanding of AI/Generative AI concepts and frameworks (e.g., TensorFlow, PyTorch, Hugging Face, OpenAI APIs). - Experience with data modeling, data structures, and database design. - Proficiency with data warehousing solutions (e.g., Redshift, BigQuery, Snowflake). - Hands-on experience with big data technologies (e.g., Hadoop, Spark, Kafka). - Proficiency in SQL and at least one programming language. What it's like to be at Markovate: - At Markovate, we thrive on collaboration and embrace every innovative idea. - We invest in continuous learning to keep our team ahead in the AI/ML landscape. - Transparent communication is key, every voice at Markovate is valued. - Our agile, data-driven approach transforms challenges into opportunities. - We offer flexible work arrangements that empower creativity and balance. - Recognition is part of our DNA; your achievements drive our success. - Markovate is committed to sustainable practices and positive community impact. - Our people-first culture means your growth and well-being are central to our mission. - Location: hybrid model 2 days onsite.,

Posted 1 week ago

Apply

3.0 - 7.0 years

0 Lacs

pune, maharashtra

On-site

The ideal candidate for this position should have advanced proficiency in Python, with a solid understanding of inheritance and classes. Additionally, the candidate should be well-versed in EMR, Athena, Redshift, AWS Glue, IAM roles, CloudFormation (CFT is optional), Apache Airflow, Git, SQL, Py-Spark, Open Metadata, and Data Lakehouse. Experience with metadata management is highly desirable, particularly with AWS Services such as S3. The candidate should possess the following key skills: - Creation of ETL Pipelines - Deploying code in EMR - Querying in Athena - Creating Airflow Dags for scheduling ETL pipelines - Knowledge of AWS Lambda and ability to create Lambda functions This role is for an individual contributor, and as such, the candidate is expected to autonomously manage client communication and proactively resolve technical issues without external assistance.,

Posted 2 months ago

Apply

9.0 - 12.0 years

15 - 30 Lacs

gurugram

Remote

We are looking for an experienced Senior Data Engineer to lead the development of scalable AWS-native data lake pipelines , with a strong focus on time series forecasting , upsert-ready architectures , and enterprise-grade data governance . This role demands end-to-end ownership of the data lifecycle from ingestion to partitioning, versioning, QA, lineage tracking, and BI delivery. The ideal candidate will be highly proficient in AWS data services , PySpark , and versioned storage formats such as Apache Hudi or Iceberg . A strong understanding of data quality , observability , governance , and metadata management in large-scale analytical systems is critical. Roles & Responsibilities Design and implement data lake zoning (Raw Clean Modeled) using Amazon S3, AWS Glue, and Athena. Ingest structured and unstructured datasets including POS, USDA, Circana, and internal sales data. Build versioned and upsert-ready ETL pipelines using Apache Hudi or Iceberg. Create forecast-ready datasets with lagged, rolling, and trend features for revenue and occupancy modeling. Optimize Athena datasets with partitioning, CTAS queries, and S3 metadata tagging. Implement S3 lifecycle policies, intelligent file partitioning, and audit logging for performance and compliance. Build reusable transformation logic using dbt-core or PySpark to support KPIs and time series outputs. Integrate data quality frameworks such as Great Expectations, custom logs, and AWS CloudWatch for field-level validation and anomaly detection. Apply data governance practices using tools like OpenMetadata or Atlan, enabling lineage tracking, data cataloging, and impact analysis. Establish QA automation frameworks for pipeline validation, data regression testing, and UAT handoff. Collaborate with BI, QA, and business teams to finalize schema design and deliverables for dashboard consumption. Ensure compliance with enterprise data governance policies and enable discovery and collaboration through metadata platforms. Preferred Candidate Profile 9-12 years of experience in data engineering. Deep hands-on experience with AWS Glue, Athena, S3, Step Functions, and Glue, Data Catalog. Strong command over PySpark, dbt-core, CTAS query optimization, and advanced partition strategies. Proven experience with versioned ingestion using Apache Hudi, Iceberg, or Delta Lake. Experience in data lineage, metadata tagging, and governance tooling using OpenMetadata, Atlan, or similar platforms. Proficiency in feature engineering for time series forecasting (lags, rolling windows, trends). Expertise in Git-based workflows, CI/CD, and deployment automation (Bitbucket or similar). Strong understanding of time series KPIs: revenue forecasts, occupancy trends, demand volatility, etc. Knowledge of statistical forecasting frameworks (e.g., Prophet, GluonTS, Scikit-learn). Experience with Superset or Streamlit for QA visualization and UAT testing. Experience building data QA frameworks and embedding data validation checks at each stage of the ETL lifecycle. Independent thinker capable of designing systems that scale with evolving business logic and compliance requirements. Excellent communication skills for collaboration with BI, QA, data governance, and business stakeholders. High attention to detail, especially around data accuracy, documentation, traceability, and auditability.

Posted Date not available

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies