We are seeking a Reference Data Sr Associate Engineer who as the Reference Data Product team member of the Enterprise Data Management organization, will be responsible for managing and promoting the use of reference data, partnering with business Subject Mater Experts on creation of vocabularies / taxonomies and ontologies, and developing analytic solutions using semantic technologies.
Roles & Responsibilities:
- Work with Reference Data Product Owner, external resources and other engineers as part of the product team
- Develop and maintain semantically appropriate concepts
- Identify and address conceptual gaps in both content and taxonomy
- Maintain ontology source vocabularies for new or edited codes
- Support product teams to help them leverage taxonomic solutions
- Analyze the data from public/internal datasets.
- Develop a Data Model/schema for taxonomy.
- Create a taxonomy in Semaphore Ontology Editor.
- Perform Bulk-import data templates into Semaphore to add/update terms in taxonomies.
- Prepare SPARQL queries to generate adhoc reports.
- Perform Gap Analysis on current and updated data
- Maintain taxonomies in Semaphore through Change Management process.
- Develop and optimize automated data ingestion / pipelines through Python/PySpark when APIs are available
- Collaborate with cross-functional teams to understand data requirements and design solutions that meet business needs
- Identify and resolve complex data-related challenges
- Participate in sprint planning meetings and provide estimations on technical implementation.
Basic Qualifications and Experience:
- Any degree with 5 - 9 years of experience in Business, Engineering, IT or related field
Functional Skills:
Must-Have Skills:
- Knowledge of controlled vocabularies, classification, ontology and taxonomy
- Experience in ontology development using Progress Semaphore , or a similar tool like Pool Party etc
- Hands on experience writing SPARQL queries on graph data
- Excellent problem-solving skills and the ability to work with large, complex datasets
- Strong understanding of data modeling, data warehousing, and data integration concepts
Good-to-Have Skills:
- Hands on experience writing SQL using any RDBMS (Redshift, Postgres, MySQL, Teradata, Oracle, etc.).
- Experience using cloud services such as AWS or Azure or GCP
- Experience working in Product Teams environment
- Knowledge of Python/R, Databricks, cloud data platforms
- Knowledge of NLP (Natural Language Processing) and AI (Artificial Intelligence) for extracting and standardizing controlled vocabularies.
- Strong understanding of data governance frameworks, tools, and best practices
Professional Certifications :
- Databricks Certificate preferred , Progress Semaphore
- SAFe Practitioner Certificate preferred
- Any Data Analysis certification (SQL, Python)
- Any cloud certification (AWS or AZURE)
Soft Skills:
- Strong analytical abilities to assess and improve master data processes and solutions.
- Excellent verbal and written communication skills, with the ability to convey complex data concepts clearly to technical and non-technical stakeholders.
- Effective problem-solving skills to address data-related issues and implement scalable solutions.
- Ability to work effectively with global, virtual teams