Pentaho Specialist Latest (PDI / Data Catalog / Data optimizer, AI)

5 - 10 years

12 - 20 Lacs

Posted:1 week ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Job Summary

Pentaho Specialist

This role requires strong ETL/ELT skills, cloud/on-prem deployment experience, metadata governance knowledge, and the ability to integrate Pentaho with modern data ecosystems (PostgreSQL, BigQuery, Snowflake, APIs, Files, S3/GCS/Azure).

Key Responsibilities

1. ETL/ELT Development using Pentaho Data Integration (PDI)

  • Design and develop

    complex transformations

    &

    jobs

    using the latest PDI (Spoon).
  • Build scalable ETL pipelines for structured & unstructured data.
  • Implement error handling, partitioning, parallelism, and reusable sub-components.
  • Integrate with:
    • PostgreSQL, Oracle, SQL Server
    • BigQuery, Snowflake, Redshift
    • APIs (REST/JSON/XML)
    • S3, GCS, Azure Blob
    • Flat files, Excel, CSV, XML, JSON

2. Pentaho Data Catalog (New Module)

  • Setup & manage

    enterprise data catalogs

    .
  • Implement:
    • Metamodeling
    • Lineage tracking
    • Dataset registration
    • Governance & access control
    • Data profiling and metadata versioning
  • Integrate catalog with BI tools, databases, and data lakes.

3. Pentaho Data Quality

  • Implement

    data validation

    ,

    cleansing

    ,

    standardization

    , and

    deduplication

    workflows.
  • Build rules-based and regex-based quality checks.
  • Identify and remediate data inconsistencies across systems.
  • Generate profiling & quality reports for stakeholders.

4. Pentaho Data Optimizer

  • Identify

    redundant, obsolete, and trivial (ROT) data

    .
  • Build pipelines for:
    • Intelligent tiering
    • Space optimization
    • Archiving & lifecycle management
  • Recommend storage optimization strategies based on usage.

5. Pentaho Data Analytics

  • Build dashboards, reports, and analytical cubes.
  • Implement drill-downs, KPIs, and interactive visualizations.
  • Connect datasets from PDI and catalog to analytics layer.
  • Integrate with enterprise BI: Tableau, Power BI, Looker.

6. API, Microservice & Workflow Integrations

  • Consume and automate API-based ETL flows.
  • Integrate Pentaho with:
    • ERP (SAP, Odoo, Oracle, BC)
    • CRM (Salesforce, HubSpot)
    • Messaging queues (Kafka, Pub/Sub, SQS)
  • Deploy PDI jobs via Pentaho Server / Carte / Command-line execution.

7. Performance Optimization & Monitoring

  • Tune transformations for high performance.
  • Use partitioning, parallel execution, memory optimization.
  • Implement monitoring for job failures, SLA breaches, and data discrepancies.

8. Documentation, Versioning & Collaboration

  • Document job designs, lineage, schemas, and business rules.
  • Maintain versioning (GitLab/GitHub/Bitbucket).
  • Work collaboratively with data engineers, analysts, BI teams, and business stakeholders.

Required Technical Skills

Pentaho Stack (Latest Releases)

  • Pentaho Data Integration (PDI)

    Advanced
  • Pentaho Data Catalog

    – Good knowledge
  • Pentaho Data Quality

    – Must have
  • Pentaho Data Optimizer

    – Preferred
  • Pentaho Data Analytics

    – Working knowledge

Data Engineering / ETL Skills

  • ETL/ELT design principles
  • SQL (strong)
  • Data modeling (dimensional/star schema)
  • Error handling frameworks
  • Performance tuning and parallel processing

Ecosystem Skills

  • Databases: PostgreSQL, Oracle, SQL Server, MySQL
  • Cloud: GCP (BigQuery), AWS (S3/Redshift), Azure (Blob/Synapse)
  • Data lakes: Delta/Parquet/ORC/Avro
  • API integrations (REST, JSON, XML)

Scripting / Tools

  • Python or Shell scripting
  • Git version control
  • CI/CD basics for data pipelines
  • Linux admin (preferred)

Experience Required

  • Total Experience:

    3–10 years
  • Relevant Pentaho Experience:

    Minimum

    2–7 years

  • Experience with latest Pentaho platform upgrades & modernization (preferred)

Preferred Skills

  • Pentaho Server administration
  • Migration from older Pentaho versions to the latest version
  • Metadata governance / MDM understanding
  • Exposure to DataOps / cloud ETL frameworks
  • Understanding of connectors between Pentaho & LLM pipelines
  • Big data processing (Spark/Hadoop) is a plus

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
Tenth Planet Technologies logo
Tenth Planet Technologies

Software Development

Innovation City

RecommendedJobs for You