What you will doIn this vital role you will be responsible for designing, building, maintaining, analyzing, and interpreting data to provide actionable insights that drive business decisions. This role involves working with large datasets, developing reports, supporting and executing data initiatives and, visualizing data to ensure data is accessible, reliable, and efficiently managed. The ideal candidate has strong technical skills, experience with big data technologies, and a deep understanding of data architecture and ETL processes
- Design, develop, and optimize data pipelines/workflows using Databricks (Spark, Delta Lake) for ingestion, transformation, and processing of large-scale data. A knowledge of Medallion Architecture will be an added advantage.
- Build and manage graph database solutions (e.g., Neo4j, Stardog, Amazon Neptune) to support knowledge graphs, relationship modeling, and inference use cases.
- Leverage SPARQL, Cypher, or Gremlin to query and analyze data within graph ecosystems.
- Implement and maintain data ontologies to support semantic interoperability and consistent data classification.
- Collaborate with architects to integrate ontology models with metadata repositories and business glossaries.
- Support data governance and metadata management through integration of lineage, quality rules, and ontology mapping.
- Contribute to data cataloging and knowledge graph implementations using RDF, OWL, or similar technologies.
- Collaborate with Data Architects, Business SMEs, and Data Scientists to design and develop end-to-end data pipelines to meet fast paced business needs across geographic regions
- Identify and resolve complex data-related challenges
- Adhere to best practices for coding, testing, and designing reusable code/component
Apply data engineering best practices including CI/CD, version control, and code modularity.Participate in sprint planning meetings and provide estimations on technical implementationBasic Qualifications:- Masters/Bachelors degree and 5 to 9 years of Computer Science, IT or related field experience
- Must have Skills:
- Bachelors or masters degree in computer science, Data Science, or a related field.
- Hands on experience with big data technologies and platforms, such as Databricks, Apache Spark (PySpark, SparkSQL), python for workflow orchestration, performance tuning on big data processing
- Proficiency in data analysis tools (eg. SQL)
- Proficient in SQL for extracting, transforming, and analyzing complex datasets from relational data stores
- Strong programming skills in Python, PySpark, and SQL.
- Solid experience designing and querying Graph Databases (e.g., Allegrograph, MarkLogic).
- Proficiency with ontology languages and tools (e.g., TopBraid, RDF, OWL, Protg, SHACL).
- Familiarity with SPARQL and/or Cypher for querying semantic and property graphs.
- Experience working with cloud data services (Azure, AWS, or GCP).
- Strong understanding of data modeling, entity relationships, and semantic interoperability.
Preferred Qualifications:- Experience with Software engineering best-practices, including but not limited to version control, infrastructure-as-code, CI/CD, and automated testing
- Knowledge of Python/R, Databricks, cloud data platforms
- Strong understanding of data governance frameworks, tools, and best practices.
- Knowledge of data protection regulations and compliance requirements (e.g., GDPR, CCPA)
- Graph-DB related certifications
Professional Certifications:- AWS Certified Data Engineer preferred
- Databricks Certificate preferred
Soft Skills:- Excellent critical-thinking and problem-solving skills
- Strong communication and collaboration skills
- Demonstrated awareness of how to function in a team setting
- Demonstrated presentation skills