We are seeking an experienced MDM Engineer with 8 12 years of experience to lead development and operations of our Master Data Management (MDM) platforms, with hands-on experience in data engineering experience. This role will involve handling the backend data engineering solution within MDM team. This is a technical role that will require hands-on work. To succeed in this role, the candidate must have strong Data Engineering experience. Candidate must have experience on technologies like (SQL, Python, PySpark, Databricks, AWS, API Integrations etc).
Roles & Responsibilities:
- Develop distributed data pipelines using PySpark on Databricks for ingesting, transforming, and publishing master data
- Write optimized SQL for large-scale data processing, including complex joins, window functions, and CTEs for MDM logic
- Implement match/merge algorithms and survivorship rules using Informatica MDM or Reltio APIs
- Build and maintain Delta Lake tables with schema evolution and versioning for master data domains
- Use AWS services like S3, Glue, Lambda, and Step Functions for orchestrating MDM workflows
- Automate data quality checks using IDQ or custom PySpark validators with rule-based profiling
- Integrate external enrichment sources (e.g., D&B, LexisNexis) via REST APIs and batch pipelines
- Design and deploy CI/CD pipelines using GitHub Actions or Jenkins for Databricks notebooks and jobs
- Monitor pipeline health using Databricks Jobs API, CloudWatch, and custom logging frameworks
- Implement fine-grained access control using Unity Catalog and attribute-based policies for MDM datasets
- Use MLflow for tracking model-based entity resolution experiments if ML-based matching is applied
- Collaborate with data stewards to expose curated MDM views via REST endpoints or Delta Sharing
Basic Qualifications and Experience:
- 8 to 13 years of experience in Business, Engineering, IT or related field
Functional Skills:
Must-Have Skills:
- Advanced proficiency in PySpark for distributed data processing and transformation
- Strong SQL skills for complex data modeling, cleansing, and aggregation logic
- Hands-on experience with Databricks including Delta Lake, notebooks, and job orchestration
- Deep understanding of MDM concepts including match/merge, survivorship, and golden record creation
- Experience with MDM platforms like Informatica MDM or Reltio, including REST API integration
- Proficiency in AWS services such as S3, Glue, Lambda, Step Functions, and IAM
- Familiarity with data quality frameworks and tools like Informatica IDQ or custom rule engines
- Experience building CI/CD pipelines for data workflows using GitHub Actions, Jenkins, or similar
- Knowledge of schema evolution, versioning, and metadata management in data lakes
- Ability to implement lineage and observability using Unity Catalog or third-party tools
- Comfort with Unix shell scripting or Python for orchestration and automation
- Hands on experience on RESTful APIs for ingesting external data sources and enrichment feeds
Good-to-Have Skills:
- Experience with Tableau or PowerBI for reporting MDM insights.
- Exposure to Agile practices and tools (JIRA, Confluence).
- Prior experience in Pharma/Life Sciences.
- Understanding of compliance and regulatory considerations in master data.
Professional Certifications :
- Any MDM certification (e.g. Informatica, Reltio etc)
- Any Data Analysis certification (SQL, Python, PySpark, Databricks)
- Any cloud certification (AWS or AZURE)
Soft Skills:
- Strong analytical abilities to assess and improve master data processes and solutions.
- Excellent verbal and written communication skills, with the ability to convey complex data concepts clearly to technical and non-technical stakeholders.
- Effective problem-solving skills to address data-related issues and implement scalable solutions.
- Ability to work effectively with global, virtual teams