Job Summary:
Supports, develops, and maintains a data and analytics platform to effectively and efficiently process, store, and make data available to analysts and other consumers. Works with Business and IT teams to understand requirements and best leverage technologies for agile data delivery at scale. Though the role category is listed as Remote, this position is designated as Hybrid.
Key Responsibilities:
Product & Business Alignment
Collaborate with the Product Owner to align data solutions with business objectives and product vision.Data Pipeline Development
Design, develop, and implement efficient data pipelines for ingesting, transforming, and transporting data into Cummins Digital Core (Azure DataLake, Snowflake) from various sources, including transactional systems (ERP, CRM).Architecture & Standards Compliance
– Ensure alignment with AAI Digital Core and AAI Solutions Architecture standards for data pipeline design, storage architectures, and governance processes.Automation & Optimization
– Implement and automate distributed data systems, ensuring reliability, scalability, and efficiency through monitoring, alerting, and performance tuning.Data Quality & Governance
– Develop and enforce data governance policies, including metadata management, access control, and retention policies, while actively monitoring and troubleshooting data quality issues.Modeling & Storage
– Design and implement conceptual, logical, and physical data models, optimizing storage architectures using distributed and cloud-based platforms (e.g., Hadoop, HBase, Cassandra, MongoDB, Accumulo, DynamoDB).Documentation & Best Practices
– Create and maintain data engineering documentation, including standard operating procedures (SOPs) and best practices, with guidance from senior engineers.Tool Evaluation & Innovation
– Support proof-of-concept (POC) initiatives and evaluate emerging data tools and technologies to enhance efficiency and effectiveness.Testing & Troubleshooting
– Participate in the testing, troubleshooting, and continuous improvement of data pipelines to ensure data integrity and usability.Agile & DevOps Practices
– Utilize agile development methodologies, including DevOps, Scrum, and Kanban, to drive iterative improvements in data-driven applications.
Experience:
- Hands-on experience gained through internships, co-ops, student employment, or team-based extracurricular projects.
- Proficiency in SQL query language and experience in developing analytical solutions.
- Exposure to open-source Big Data technologies such as Spark, Scala/Java, MapReduce, Hive, HBase, and Kafka.
- Familiarity with cloud-based, clustered computing environments and large-scale data movement applications.
- Understanding of Agile software development methodologies.
- Exposure to IoT technology and data-driven solutions.
Technical Skills:
Programming Languages:
Proficiency in Python, Java, and/or Scala.Database Management:
Expertise in SQL and NoSQL databases.Big Data Technologies:
Hands-on experience with Hadoop, Spark, Kafka, and similar frameworks.Cloud Services:
Experience with Azure, Databricks, and AWS platforms.ETL Processes:
Strong understanding of Extract, Transform, Load (ETL) processes.Data Replication:
Working knowledge of replication technologies like Qlik Replicate is a plus.API Integration:
Experience working with APIs to consume data from ERP and CRM systems.