Jobs
Interviews

211 Data Lakes Jobs - Page 9

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

10.0 - 14.0 years

10 - 16 Lacs

Pune

Work from Office

Role Overview:- The Senior Tech Lead - GCP Data Engineering leads the design, development, and optimization of advanced data solutions. The jobholder has extensive experience with GCP services, data architecture, and team leadership, with a proven ability to deliver scalable and secure data systems. Responsibilities:- Lead the design and implementation of GCP-based data architectures and pipelines. Architect and optimize data solutions using GCP services such as BigQuery, Dataflow, Pub/Sub, and Cloud Storage. Provide technical leadership and mentorship to a team of data engineers. Collaborate with stakeholders to define project requirements and ensure alignment with business goals. Ensure best practices in data security, governance, and compliance. Troubleshoot and resolve complex technical issues in GCP data environments. Stay updated on the latest GCP technologies and industry trends. Key Technical Skills & Responsibilities Overall 10+ Yrs of experience with GCP and Data Warehousing concepts; Coding; reviewing; testing and debugging Experience as architect on GCP implementation/or migration data projects. Must have understanding of Data Lakes and Data Lake Architectures, best practices in data storage, loading, retrieving data from data lakes. Experience in develop and maintain pipelines in GCP platform, understand best practices of bringing on-prem data to the cloud. File loading, compression, parallelization of loads, optimization etc. Working knowledge and/or experience with Google Data Studio, looker and other visualization tools Working knowledge in Hadoop and Python/Java would be an added advantage Experience in designing and planning BI solutions, Debugging, monitoring and troubleshooting BI solutions, Creating and deploying reports and Writing relational and multidimensional database queries. Any experience in NOSQL environment is a plus. Must be good with Python and PySpark for data pipeline building. Must have experience of working with streaming data sources and Kafka. GCP Services - Cloud Storage, BigQuery , Big Table, Cloud Spanner, Cloud SQL, DataStore/Firestore, DataFlow, DataProc, DataFusion, DataPrep, Pub/Sub, Data Studio, Looker, Data Catalog, Cloud Composer, Cloud Scheduler, Cloud Function Eligibility Criteria: Bachelors degree in Computer Science, Data Engineering, or a related field. Extensive experience with GCP data services and tools. GCP certification (e.g., Professional Data Engineer, Professional Cloud Architect). Experience with machine learning and AI integration in GCP environments. Strong understanding of data modeling, ETL/ELT processes, and cloud integration. Proven leadership experience in managing technical teams. Excellent problem-solving and communication skills.

Posted 3 months ago

Apply

7.0 - 11.0 years

9 - 13 Lacs

Mumbai

Work from Office

Type: Contract | Duration: 6 Months We are seeking an experienced Data Engineer to join our team for a 6-month contract assignment. The ideal candidate will work on data warehouse development, ETL pipelines, and analytics enablement using Snowflake, Azure Data Factory (ADF), dbt, and other tools. This role requires strong hands-on experience with data integration platforms, documentation, and pipeline optimizationespecially in cloud environments such as Azure and AWS. Key Responsibilities Build and maintain ETL pipelines using Fivetran, dbt, and Azure Data Factory Monitor and support production ETL jobs Develop and maintain data lineage documentation for all systems Design data mapping and documentation to aid QA/UAT testing Evaluate and recommend modern data integration tools Optimize shared data workflows and batch schedules Collaborate with Data Quality Analysts to ensure accuracy and integrity of data flows Participate in performance tuning and improvement recommendations Support BI/MDM initiatives including Data Vault and Data Lakes RequiredSkills 7+ years of experience in data engineering roles Strong command of SQL, with 5+ years of hands-on development Deep experience with Snowflake, Azure Data Factory, dbt Strong background with ETL tools (Informatica, Talend, ADF, dbt, etc.) Bachelor's in CS, Engineering, Math, or related field Experience in healthcare domain (working with PHI/PII data) Familiarity with scripting/programming (Python, Perl, Java, Linux-based environments) Excellent communication and documentation skills Experience with BI tools like Power BI, Cognos, etc. Organized, self-starter with strong time-management and critical thinking abilities Nice To Have Experience with Data Lakes and Data Vaults QA & UAT alignment with clear development documentation Multi-cloud experience (especially Azure, AWS)

Posted 3 months ago

Apply

8 - 13 years

12 - 22 Lacs

Hyderabad, Bengaluru, Mumbai (All Areas)

Work from Office

Greetings of The Day...!!! We have an URGENT on-rolls opening for the position of "Snowflake Architect" at One of our reputed clients for WFH. Name of the Company - Confidential Rolls - Onrolls Mode of Employment - FTE / Sub-Con / Contract Job Location - Remote Job Work Timings Night Shift – 06.00 pm to 03.00 am IST Nature of Work – Work from Home Working Days – 5 Days Weekly Educational Qualification - Bachelor's degree in computer science, BCA, engineering, or a related field. Salary – Maximum CTC Would be 23LPA (Salary & benefits package will be commensurate with experience and qualifications, PF, Medical Insurance cover available) Language Known - English, Hindi, & local language. Experience – 9 Years + of relevant experience in the same domain. Job Summary: We are seeking a highly skilled and experienced Snowflake Architect to lead the design, development, and implementation of scalable, secure, and high-performance data warehousing solutions on the Snowflake platform. The ideal candidate will possess deep expertise in data modelling, cloud architecture, and modern ELT frameworks. You will be responsible for architecting robust data pipelines, optimizing query performance, and ensuring enterprise-grade data governance and security. In this role, you will collaborate with data engineers, analysts, and business stakeholders to deliver efficient data solutions that drive informed decision-making across the organization. Key Responsibilities: Manage and maintain the Snowflake platform to ensure optimal performance and reliability. Collaborate with data engineers and analysts to design and implement data pipelines. Develop and optimize SQL queries for efficient data retrieval and manipulation. Create custom scripts and functions using JavaScript and Python to automate platform tasks. Troubleshoot platform issues and provide timely resolutions. Implement security best practices to protect data within the Snowflake platform. Stay updated on the latest Snowflake features and best practices to continuously improve platform performance. Required Qualifications: Bachelor’s degree in computer science, Engineering, or a related field. Minimum of Nine years of experience in managing any Database platform. Proficiency in SQL for data querying and manipulation. Strong programming skills in JavaScript and Python. Experience in optimizing and tuning Snowflake for performance. Preferred Skills: Technical Expertise Cloud & Integration Performance & Optimization Security & Governance Soft Skills THE PERSON SHOULD BE WILLING TO JOIN IN 07-10 DAYS TIME OR IMMEDIATE JOINER. Request for interested candidates; Please share your updated resume with us below Email-ID executivehr@monalisammllp.com, also candidate can call or WhatsApp us at 9029895581. Current /Last Net in Hand - Salary will be offered based on the interview /Technical evaluation process -- Notice Period & LWD was/will be - Reason for Changing the job - Total Years of Experience in Specific Field – Please specify the location which you are from – Do you hold any offer from any other association - ? Regards, Monalisa Group of Services HR Department 9029895581 – Call / WhatsApp executivehr@monalisammllp.com

Posted 4 months ago

Apply

5 - 8 years

16 - 18 Lacs

Pune

Work from Office

Oracle PLSQL Developer The Associate shall perform the role of PLSQL Developer and shall be responsible for the following: Hands on coding in SQL and PLSQL Create/implement database architecture for new applications and enhancements to existing applications Hands-on experience in Data Modeling, SSAS, Cubes, query Optimization Create/implement strategies for partitioning, archiving and maturity models for applications. Review queries created by other developers for adherence to standards and performance issues PLSQL, TSQL, SQL Query Optimization, Data Models, Data lakes Interact with Database, Applications analysts and Business users for estimations. Do impact analysis of existing applications and suggest best ways of incorporating new requirements Proactively engage in the remediation of software issues related to code quality, security, and/or pattern/frameworks.

Posted 4 months ago

Apply

2.0 - 3.0 years

4 - 5 Lacs

pune

Work from Office

The Data Engineer supports, develops, and maintains a data and analytics platform to efficiently process, store, and make data available to analysts and other consumers. This role collaborates with Business and IT teams to understand requirements and best leverage technologies for agile data delivery at scale. Note:- Even though the role is categorized as Remote, it will follow a hybrid work model. Key Responsibilities: Implement and automate deployment of distributed systems for ingesting and transforming data from various sources (relational, event-based, unstructured). Develop and operate large-scale data storage and processing solutions using cloud-based platforms (e.g., Data Lakes, Hadoop, HBase, Cassandra, MongoDB, DynamoDB). Ensure data quality and integrity through continuous monitoring and troubleshooting. Implement data governance processes, managing metadata, access, and data retention. Develop scalable, efficient, and quality data pipelines with monitoring and alert mechanisms. Design and implement physical data models and storage architectures based on best practices. Analyze complex data elements and systems, data flow, dependencies, and relationships to contribute to conceptual, physical, and logical data models. Participate in testing and troubleshooting of data pipelines. Utilize agile development technologies such as DevOps, Scrum, and Kanban for continuous improvement in data-driven applications. External Qualifications and Competencies Qualifications, Skills, and Experience: Must-Have: 2-3 years of experience in data engineering with expertise in Azure Databricks and Scala/Python. Hands-on experience with Spark (Scala/PySpark) and SQL. Strong understanding of SPARK Streaming, SPARK Internals, and Query Optimization. Proficiency in Azure Cloud Services. Agile Development experience. Experience in Unit Testing of ETL pipelines. Expertise in creating ETL pipelines integrating ML models. Knowledge of Big Data storage strategies (optimization and performance). Strong problem-solving skills. Basic understanding of Data Models (SQL/NoSQL) including Delta Lake or Lakehouse. Exposure to Agile software development methodologies. Quick learner with adaptability to new technologies. Nice-to-Have: Understanding of the ML lifecycle. Exposure to Big Data open-source technologies. Experience with clustered compute cloud-based implementations. Familiarity with developing applications requiring large file movement in cloud environments. Experience in building analytical solutions. Exposure to IoT technology. Competencies: System Requirements Engineering: Translates stakeholder needs into verifiable requirements. Collaborates: Builds partnerships and works collaboratively with others. Communicates Effectively: Develops and delivers clear communications for various audiences. Customer Focus: Builds strong customer relationships and delivers customer-centric solutions. Decision Quality: Makes timely and informed decisions to drive progress. Data Extraction: Performs ETL activities from various sources using appropriate tools and technologies. Programming: Writes and tests computer code using industry standards, tools, and automation. Quality Assurance Metrics: Applies measurement science to assess solution effectiveness. Solution Documentation: Documents and communicates solutions to enable knowledge transfer. Solution Validation Testing: Ensures configuration changes meet design and customer requirements. Data Quality: Identifies and corrects data flaws to support governance and decision-making. Problem Solving: Uses systematic analysis to identify and resolve issues effectively. Values Differences: Recognizes and values diverse perspectives and cultures. Additional Responsibilities Unique to this Position Education, Licenses, and Certifications: College, university, or equivalent degree in a relevant technical discipline, or equivalent experience required. This position may require licensing for compliance with export controls or sanctions regulations. Work Schedule: Work primarily with stakeholders in the US, requiring a 2-3 hour overlap during EST hours as needed.

Posted Date not available

Apply

3.0 - 5.0 years

5 - 7 Lacs

pune

Work from Office

Please note even though the GPP mentions Remote, this is a Hybrid role. Key Responsibilities: Implement and automate deployment of distributed systems for ingesting and transforming data from various sources (relational, event-based, unstructured). Continuously monitor and troubleshoot data quality and integrity issues. Implement data governance processes and methods for managing metadata, access, and retention for internal and external users. Develop reliable, efficient, scalable, and quality data pipelines with monitoring and alert mechanisms using ETL/ELT tools or scripting languages. Develop physical data models and implement data storage architectures as per design guidelines. Analyze complex data elements and systems, data flow, dependencies, and relationships to contribute to conceptual, physical, and logical data models. Participate in testing and troubleshooting of data pipelines. Develop and operate large-scale data storage and processing solutions using distributed and cloud-based platforms (e.g., Data Lakes, Hadoop, Hbase, Cassandra, MongoDB, Accumulo, DynamoDB). Use agile development technologies, such as DevOps, Scrum, Kanban, and continuous improvement cycles, for data-driven applications. External Qualifications and Competencies Qualifications: College, university, or equivalent degree in a relevant technical discipline, or relevant equivalent experience required. This position may require licensing for compliance with export controls or sanctions regulations. Competencies: System Requirements Engineering: Translate stakeholder needs into verifiable requirements and establish acceptance criteria. Collaborates: Build partnerships and work collaboratively with others to meet shared objectives. Communicates Effectively: Develop and deliver multi-mode communications that convey a clear understanding of the unique needs of different audiences. Customer Focus: Build strong customer relationships and deliver customer-centric solutions. Decision Quality: Make good and timely decisions that keep the organization moving forward. Data Extraction: Perform ETL activities from various sources and transform them for consumption by downstream applications and users. Programming: Create, write, and test computer code, test scripts, and build scripts using industry standards and tools. Quality Assurance Metrics: Apply measurement science to assess whether a solution meets its intended outcomes. Solution Documentation: Document information and solutions based on knowledge gained during product development activities. Solution Validation Testing: Validate configuration item changes or solutions using best practices. Data Quality: Identify, understand, and correct flaws in data to support effective information governance. Problem Solving: Solve problems using systematic analysis processes and industry-standard methodologies. Values Differences: Recognize the value that different perspectives and cultures bring to an organization. Additional Responsibilities Unique to this Position Skills and Experience Needed: Must-Have: 3-5 years of experience in data engineering with a strong background in Azure Databricks and Scala/Python. Hands-on experience with Spark (Scala/PySpark) and SQL. Experience with SPARK Streaming, SPARK Internals, and Query Optimization. Proficiency in Azure Cloud Services. Agile Development experience. Unit Testing of ETL. Experience creating ETL pipelines with ML model integration. Knowledge of Big Data storage strategies (optimization and performance). Critical problem-solving skills. Basic understanding of Data Models (SQL/NoSQL) including Delta Lake or Lakehouse. Quick learner. Nice-to-Have: Understanding of the ML lifecycle. Exposure to Big Data open source technologies. Experience with SPARK, Scala/Java, Map-Reduce, Hive, Hbase, and Kafka. SQL query language proficiency. Experience with clustered compute cloud-based implementations. Familiarity with developing applications requiring large file movement for a cloud-based environment. Exposure to Agile software development. Experience building analytical solutions. Exposure to IoT technology. Work Schedule: Most of the work will be with stakeholders in the US, with an overlap of 2-3 hours during EST hours on a need basis.

Posted Date not available

Apply

5.0 - 9.0 years

17 - 25 Lacs

pune, bengaluru, delhi / ncr

Work from Office

Job Hiring || Big data Testing Role || MNC Company Total Experience- 5-9 Years Notice Period - 0-30 Days Location - Bangalore, Pune, Gurgaon Job Description : Key Responsibilities: Perform manual testing of big data systems, ensuring data accuracy, integrity, and reliability. Validate data pipelines and workflows across tools such as PySpark, Azure Data Factory, and Azure Synapse. Execute SQL queries to verify data manipulation and transformations within Data Lakes. Develop and maintain test cases, test plans, and testing documentation. Ensure compliance with industry standards and best practices for big data quality assurance. Requirements: Bachelor's degree in Computer Science, Information Technology, or a related field. 5 to 8 years of experience in manual QA testing, with a focus on big data systems. Proficiency in tools and technologies such as PySpark, Azure Data Factory, SQL, Data Lake, and Azure Synapse. Strong understanding of data manipulation and validation techniques. Preferred Qualifications: Experience with cloud infrastructure solutions and IoT-related data systems. Familiarity with automation testing tools for big data environments. Certifications in Azure or other relevant big data technologies. Interested candidate can share resume at vanshika@theglove.co.in

Posted Date not available

Apply

3.0 - 8.0 years

10 - 15 Lacs

bengaluru

Work from Office

Job Type: Contract Experience Level: 3+ Years Job Overview: We are seeking an experienced Data Engineer to join our dynamic team. As a Data Engineer, you will be responsible for designing, building, and maintaining data pipelines, processing large-scale datasets, and ensuring data availability for analytics. The ideal candidate will have a strong background in distributed systems, database design, and data engineering practices, with hands-on experience working with modern data technologies. Key Responsibilities: Design, implement, and optimize data pipelines using tools like Spark, Kafka, and Airflow to handle large-scale data processing and ETL tasks. Work with various data storage systems (e.g., PostgreSQL, MySQL, NoSQL databases) to ensure efficient and reliable data storage and retrieval. Collaborate with data scientists, analysts, and other stakeholders to design solutions that meet business needs and data requirements. Develop and maintain robust, scalable, and efficient data architectures and data warehousing solutions. Process structured and unstructured data from diverse sources, ensuring data is cleansed, transformed, and loaded effectively. Optimize query performance and troubleshoot database issues to ensure high data availability and minimal downtime. Implement data governance practices to ensure data integrity, security, and compliance. Participate in code reviews, knowledge sharing, and continuous improvement of team processes. Required Skills & Experience: Minimum of 3+ years of relevant hands-on experience in data engineering. Extensive experience with distributed systems (e.g., Apache Spark, Apache Kafka) for large-scale data processing. Proficiency in SQL and experience working with relational databases like PostgreSQL, MySQL, and NoSQL technologies. Strong understanding of data warehousing concepts, ETL processes, and data pipeline design. Experience building and managing data pipelines using Apache Airflow or similar orchestration tools. Hands-on experience in data modeling, schema design, and optimizing database performance. Solid understanding of cloud-based data solutions (e.g., AWS, GCP, Azure) and familiarity with cloud-native data tools is a plus. Ability to work collaboratively with cross-functional teams and communicate complex technical concepts to non-technical stakeholders. Preferred Skills: Experience with containerization and orchestration tools such as Docker and Kubernetes. Familiarity with data lakes, data mesh, or data fabric architectures. Knowledge of machine learning pipelines or frameworks is a plus. Experience with CI/CD pipelines for data engineering workflows. Education: Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience. Mode of Work: 3 days work from office/2 days work from home.

Posted Date not available

Apply

4.0 - 6.0 years

5 - 10 Lacs

pune, mumbai (all areas)

Work from Office

We are seeking a driven and experienced Nova Context Developer to strengthen the OSS practice for a leading digital solutions company specializing in Cloud, AI-AIOps, product engineering services, and system integration. Key Responsibilities: Actively contribute as a team member on project implementations. Develop, construct, and test components of a Nova Context (Ontology) solution under the supervision of technical leads. Build and maintain strong technical relationships with customers. Support pre-sales efforts when required. Collaborate effectively with internal and client teams to ensure successful project delivery. Continuously develop consulting capabilities and professional competencies. Follow guidance from lead or senior consultants on assigned projects. Key Qualifications & Requirements: Minimum 1 year of hands-on experience in Nova Context (Ontology) solution deployment and construction. 3+ years of experience managing large, data-oriented projects in a customer-facing role. Strong analytical skills to interpret complex datasets, identify patterns, and establish data relationships. Proficient in extracting data from Excel, XML, and using ETL processes. Experience with graph database solutions (NoSQL), RDF, and graph data modeling. Strong command of graph query handling, especially SPARQL and PRONTO (must-have). Advanced scripting and development skills in Python, BASH, Perl, and Linux Shell / CLI. Good understanding of the Telco domain (Wireless: 2G/3G/4G/5G, Wireline: GPON, Fibre, Transport: Microwave, DWDM, SDH, PDH). IT infrastructure knowledge, including virtualization (VMware/MS Hypervisor) and container technologies (Docker, K3s, Kubernetes). Familiarity with data lakes and data modeling techniques. Additional Skills: Strong grasp of SDLC and implementation best practices. Quality-focused with a "completer-finisher" mindset. Business-aware, understanding broader departmental and organizational goals. Self-driven with strong problem-solving skills. Excellent communication and relationship-building skills, including cross-cultural collaboration.

Posted Date not available

Apply

5.0 - 8.0 years

5 - 9 Lacs

pune

Work from Office

Job Summary: Leads projects for design, development and maintenance of a data and analytics platform. Effectively and efficiently process, store and make data available to analysts and other consumers. Works with key business stakeholders, IT experts and subject-matter experts to plan, design and deliver optimal analytics and data science solutions. Works on one or many product teams at a time. Key Responsibilities: Designs and automates deployment of our distributed system for ingesting and transforming data from various types of sources (relational, event-based, unstructured). Designs and implements framework to continuously monitor and troubleshoot data quality and data integrity issues. Implements data governance processes and methods for managing metadata, access, retention to data for internal and external users. Designs and provide guidance on building reliable, efficient, scalable and quality data pipelines with monitoring and alert mechanisms that combine a variety of sources using ETL/ELT tools or scripting languages. Designs and implements physical data models to define the database structure. Optimizing database performance through efficient indexing and table relationships. Participates in optimizing, testing, and troubleshooting of data pipelines. Designs, develops and operates large scale data storage and processing solutions using different distributed and cloud based platforms for storing data (e.g. Data Lakes, Hadoop, Hbase, Cassandra, MongoDB, Accumulo, DynamoDB, others). Uses innovative and modern tools, techniques and architectures to partially or completely automate the most-common, repeatable and tedious data preparation and integration tasks in order to minimize manual and error-prone processes and improve productivity. Assists with renovating the data management infrastructure to drive automation in data integration and management. Ensures the timeliness and success of critical analytics initiatives by using agile development technologies such as DevOps, Scrum, Kanban Coaches and develops less experienced team members. External Qualifications and Competencies Competencies: System Requirements Engineering - Uses appropriate methods and tools to translate stakeholder needs into verifiable requirements to which designs are developed; establishes acceptance criteria for the system of interest through analysis, allocation and negotiation; tracks the status of requirements throughout the system lifecycle; assesses the impact of changes to system requirements on project scope, schedule, and resources; creates and maintains information linkages to related artifacts.Collaborates - Building partnerships and working collaboratively with others to meet shared objectives. Communicates effectively - Developing and delivering multi-mode communications that convey a clear understanding of the unique needs of different audiences.Customer focus - Building strong customer relationships and delivering customer-centric solutions.Decision quality - Making good and timely decisions that keep the organization moving forward. Data Extraction - Performs data extract-transform-load (ETL) activitiesfrom variety of sources and transforms them for consumption by various downstream applications and users using appropriate tools and technologies.Programming - Creates, writes and tests computer code, test scripts, and build scripts using algorithmic analysis and design, industry standards and tools, version control, and build and test automation to meet business, technical, security, governance and compliance requirements.Quality Assurance Metrics - Applies the science of measurement to assess whether a solution meets its intended outcomes using the IT Operating Model (ITOM), including the SDLC standards, tools, metrics and key performance indicators, to deliver a quality product. Solution Documentation - Documents information and solution based on knowledge gained as part of product development activities; communicates to stakeholders with the goal of enabling improved productivity and effective knowledge transfer to others who were not originally part of the initial learning.Solution Validation Testing - Validates a configuration item change or solution using the Function's defined best practices, including the Systems Development Life Cycle (SDLC) standards, tools and metrics, to ensure that it works as designed and meets customer requirements. Data Quality - Identifies, understands and corrects flaws in data that supports effective information governance across operational business processes and decision making.Problem Solving - Solves problems and may mentor others on effective problem solving by using a systematic analysis process by leveraging industry standard methodologies to create problem traceability and protect the customer; determines the assignable cause; implements robust, data-based solutions; identifies the systemic root causes and ensures actions to prevent problem reoccurrence are implemented.Values differences - Recognizing the value that different perspectives and cultures bring to an organization. Education, Licenses, Certifications: College, university, or equivalent degree in relevant technical discipline, or relevant equivalent experience required. This position may require licensing for compliance with export controls or sanctions regulations. Experience: Intermediate experience in a relevant discipline area is required. Knowledge of the latest technologies and trends in data engineering are highly preferred and includes:- 5-8 years of experince -Familiarity analyzing complex business systems, industry requirements, and/or data regulations- Background in processing and managing large data sets- Design and development for a Big Data platform using open source and third-party tools- SPARK, Scala/Java, Map-Reduce, Hive, Hbase, and Kafka or equivalent college coursework- SQL query language- Clustered compute cloud-based implementation experience- Experience developing applications requiring large file movement for a Cloud-based environment and other data extraction tools and methods from a variety of sources- Experience in building analytical solutionsIntermediate experiences in the following are preferred:- Experience with IoT technology- Experience in Agile software development Additional Responsibilities Unique to this Position 1) Work closely with business Product Owner to understand product vision. 2) Play a key role across DBU Data & Analytics Power Cells to define, develop data pipelines for efficient data transport into Cummins Digital Core ( Azure DataLake, Snowflake). 3) Collaborate closely with AAI Digital Core and AAI Solutions Architecture to ensure alignment of DBU project data pipeline design standards. 4) Independently design, develop, test, implement complex data pipelines from transactional systems (ERP, CRM) to Datawarehouses, DataLake. 5) Responsible for creation, maintenence and management of DBU Data & Analytics data engineering documentation and standard operating procedures (SOP). 6) Take part in evaluation of new data tools, POCs and provide suggestions. 7) Take full ownership of the developed data pipelines, providing ongoing support for enhancements and performance optimization. 8) Proactively address and resolve issues that compromise data accuracy and usability. Preferred Skills 1. Programming Languages:Proficiency in languages such as Python, Java, and/or Scala. 2. Database Management:Expertise in SQL and NoSQL databases. 3. Big Data Technologies:Experience with Hadoop, Spark, Kafka, and other big data frameworks. 4. Cloud Services:Experience with Azure, Databricks and AWS cloud platforms. 5. ETL Processes:Strong understanding of Extract, Transform, Load (ETL) processes. 6. Data Replication: Working knowledge of replication technologies like Qlik Replicate is a plus 7. API: Working knowledge of API to consume data from ERP, CRM

Posted Date not available

Apply

5.0 - 8.0 years

5 - 10 Lacs

pune

Work from Office

Job Summary: Leads projects for design, development and maintenance of a data and analytics platform. Effectively and efficiently process, store and make data available to analysts and other consumers. Works with key business stakeholders, IT experts and subject-matter experts to plan, design and deliver optimal analytics and data science solutions. Works on one or many product teams at a time. Key Responsibilities: Designs and automates deployment of our distributed system for ingesting and transforming data from various types of sources (relational, event-based, unstructured). Designs and implements framework to continuously monitor and troubleshoot data quality and data integrity issues. Implements data governance processes and methods for managing metadata, access, retention to data for internal and external users. Designs and provide guidance on building reliable, efficient, scalable and quality data pipelines with monitoring and alert mechanisms that combine a variety of sources using ETL/ELT tools or scripting languages. Designs and implements physical data models to define the database structure. Optimizing database performance through efficient indexing and table relationships. Participates in optimizing, testing, and troubleshooting of data pipelines. Designs, develops and operates large scale data storage and processing solutions using different distributed and cloud based platforms for storing data (e.g. Data Lakes, Hadoop, Hbase, Cassandra, MongoDB, Accumulo, DynamoDB, others). Uses innovative and modern tools, techniques and architectures to partially or completely automate the most-common, repeatable and tedious data preparation and integration tasks in order to minimize manual and error-prone processes and improve productivity. Assists with renovating the data management infrastructure to drive automation in data integration and management. Ensures the timeliness and success of critical analytics initiatives by using agile development technologies such as DevOps, Scrum, Kanban Coaches and develops less experienced team members. External Qualifications and Competencies Competencies: System Requirements Engineering - Uses appropriate methods and tools to translate stakeholder needs into verifiable requirements to which designs are developed; establishes acceptance criteria for the system of interest through analysis, allocation and negotiation; tracks the status of requirements throughout the system lifecycle; assesses the impact of changes to system requirements on project scope, schedule, and resources; creates and maintains information linkages to related artifacts. Collaborates - Building partnerships and working collaboratively with others to meet shared objectives. Communicates effectively - Developing and delivering multi-mode communications that convey a clear understanding of the unique needs of different audiences. Customer focus - Building strong customer relationships and delivering customer-centric solutions. Decision quality - Making good and timely decisions that keep the organization moving forward. Data Extraction - Performs data extract-transform-load (ETL) activitiesfrom variety of sources and transforms them for consumption by various downstream applications and users using appropriate tools and technologies. Programming - Creates, writes and tests computer code, test scripts, and build scripts using algorithmic analysis and design, industry standards and tools, version control, and build and test automation to meet business, technical, security, governance and compliance requirements. Quality Assurance Metrics - Applies the science of measurement to assess whether a solution meets its intended outcomes using the IT Operating Model (ITOM), including the SDLC standards, tools, metrics and key performance indicators, to deliver a quality product. Solution Documentation - Documents information and solution based on knowledge gained as part of product development activities; communicates to stakeholders with the goal of enabling improved productivity and effective knowledge transfer to others who were not originally part of the initial learning. Solution Validation Testing - Validates a configuration item change or solution using the Function's defined best practices, including the Systems Development Life Cycle (SDLC) standards, tools and metrics, to ensure that it works as designed and meets customer requirements. Data Quality - Identifies, understands and corrects flaws in data that supports effective information governance across operational business processes and decision making. Problem Solving - Solves problems and may mentor others on effective problem solving by using a systematic analysis process by leveraging industry standard methodologies to create problem traceability and protect the customer; determines the assignable cause; implements robust, data-based solutions; identifies the systemic root causes and ensures actions to prevent problem reoccurrence are implemented. Values differences - Recognizing the value that different perspectives and cultures bring to an organization. Education, Licenses, Certifications: College, university, or equivalent degree in relevant technical discipline, or relevant equivalent experience required. This position may require licensing for compliance with export controls or sanctions regulations. Experience: Intermediate experience in a relevant discipline area is required. Knowledge of the latest technologies and trends in data engineering are highly preferred and includes: - 5-8 years of experience -Familiarity analyzing complex business systems, industry requirements, and/or data regulations - Background in processing and managing large data sets - Design and development for a Big Data platform using open source and third-party tools - SPARK, Scala/Java, Map-Reduce, Hive, Hbase, and Kafka or equivalent college coursework - SQL query language - Clustered compute cloud-based implementation experience - Experience developing applications requiring large file movement for a Cloud-based environment and other data extraction tools and methods from a variety of sources - Experience in building analytical solutions Intermediate experiences in the following are preferred: - Experience with IoT technology - Experience in Agile software development Additional Responsibilities Unique to this Position 1) Work closely with business Product Owner to understand product vision. 2) Play a key role across DBU Data & Analytics Power Cells to define, develop data pipelines for efficient data transport into Cummins Digital Core ( Azure DataLake, Snowflake). 3) Collaborate closely with AAI Digital Core and AAI Solutions Architecture to ensure alignment of DBU project data pipeline design standards. 4) Independently design, develop, test, implement complex data pipelines from transactional systems (ERP, CRM) to Datawarehouses, DataLake. 5) Responsible for creation, maintenence and management of DBU Data & Analytics data engineering documentation and standard operating procedures (SOP). 6) Take part in evaluation of new data tools, POCs and provide suggestions. 7) Take full ownership of the developed data pipelines, providing ongoing support for enhancements and performance optimization. 8) Proactively address and resolve issues that compromise data accuracy and usability. Preferred Skills 1. Programming Languages:Proficiency in languages such as Python, Java, and/or Scala. 2. Database Management:Expertise in SQL and NoSQL databases. 3. Big Data Technologies:Experience with Hadoop, Spark, Kafka, and other big data frameworks. 4. Cloud Services:Experience with Azure, Databricks and AWS cloud platforms. 5. ETL Processes:Strong understanding of Extract, Transform, Load (ETL) processes. 6. Data Replication: Working knowledge of replication technologies like Qlik Replicate is a plus 7. API: Working knowledge of API to consume data from ERP, CRM

Posted Date not available

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies