Home
Jobs

310 Data Lake Jobs

Filter Interviews
Min: 0 years
Max: 25 years
Min: ₹0
Max: ₹10000000
Setup a job Alert
Filter
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

6.0 - 10.0 years

10 - 20 Lacs

Chennai

Work from Office

Naukri logo

Do you love leading data-driven transformations and mentoring teams in building scalable data platforms? Were looking for a Data Tech Lead to drive innovation, architecture, and execution across our data ecosystem. Your Role: Lead the design and implementation of modern data architecture, ETL/ELT pipelines, and data lakes/warehouses Set technical direction and mentor a team of talented data engineers Collaborate with product, analytics, and engineering teams to translate business needs into data solutions Define and enforce data modeling standards, governance, and naming conventions Take ownership of the end-to-end data lifecycle: ingestion, transformation, storage, access, and monitoring Evaluate and implement the right cloud/on-prem tools and frameworks Troubleshoot and resolve complex data challenges, while optimizing for performance and cost Contribute to documentation, design blueprints, and knowledge sharing We’re Looking For Someone With: Proven experience in leading data engineering or data platform teams Expertise in designing scalable data architectures and modern data stacks Strong hands-on experience with cloud platforms (AWS/Azure/GCP) and big data tools Proficiency in Python, SQL, Spark, Databricks, or similar tools A passion for clean code, performance tuning, and high-impact delivery Strong communication, collaboration, and leadership skills

Posted 1 day ago

Apply

3.0 - 6.0 years

3 - 6 Lacs

Pune

Work from Office

Naukri logo

Capgemini Invent Capgemini Invent is the digital innovation, consulting and transformation brand of the Capgemini Group, a global business line that combines market leading expertise in strategy, technology, data science and creative design, to help CxOs envision and build what’s next for their businesses. Your role Use Design thinking and a consultative approach to conceive cutting edge technology solutions for business problems, mining core Insights as a service model Engage with project activities across the Information lifecycle, often related to paradigms like - Building & managing Business data lakes and ingesting data streams to prepare data , Developing machine learning and predictive models to analyse data , Visualizing data , Empowering Information consumers with agile Data Models that enable Self-Service BI , Specialize in Business Models and architectures across various Industry verticals Participate in business requirements / functional specification definition, scope management, data analysis and design, in collaboration with both business stakeholders and IT teams , Document detailed business requirements, develop solution design and specifications. Support and coordinate system implementations through the project lifecycle working with other teams on a local and global basis Work closely with the solutions architecture team to define the target detailed solution to deliver the business requirements. Your Profile B.E. / B.Tech. + MBA (Systems / Data / Data Science/ Analytics / Finance) with a good academic background Strong communication, facilitation, relationship-building, presentation, and negotiation skills Consultant must have a flair for storytelling and be able to present interesting insights from the data. Consultant should have good Soft skills like good communication, proactive, self-learning skills etc Consultants are expected to be flexible to the dynamically changing needs of the industry. Must have good exposure to Database management systems, Good to have knowledge about big data ecosystem like Hadoop. Hands on with SQL and good knowledge of noSQL based databases. Good to have working knowledge of R/Python language. Exposure to / Knowledge about one of the cloud ecosystems – Google / AWS/ Azure What you will love about working here We recognize the significance of flexible work arrangements to provide support. Be it remote work, or flexible work hours, you will get an environment to maintain healthy work life balance. At the heart of our mission is your career growth. Our array of career growth programs and diverse professions are crafted to support you in exploring a world of opportunities. Equip yourself with valuable certifications in the latest technologies such as Generative AI. About Capgemini Capgemini is a global business and technology transformation partner, helping organizations to accelerate their dual transition to a digital and sustainable world, while creating tangible impact for enterprises and society. It is a responsible and diverse group of 340,000 team members in more than 50 countries. With its strong over 55-year heritage, Capgemini is trusted by its clients to unlock the value of technology to address the entire breadth of their business needs. It delivers end-to-end services and solutions leveraging strengths from strategy and design to engineering, all fueled by its market leading capabilities in AI, cloud and data, combined with its deep industry expertise and partner ecosystem. The Group reported 2023 global revenues of "22.5 billion.

Posted 1 day ago

Apply

1.0 - 2.0 years

4 - 8 Lacs

Bengaluru

Work from Office

Naukri logo

About the Role: Grade Level (for internal use): 08 Role Python Developer : A data collections analyst is responsible for gathering, organizing, and analyzing data from various sources. They work closely with different teams to understand their data needs and develop efficient data collection processes. The analyst must have strong technical skills in data harvesting, data science, and automation. They should also be proficient in programming languages like Python and have experience with data structures and multi-threading. The analyst will collaborate with other developers, participate in requirement gathering, and contribute to project planning activities. They should be able to work independently, monitor project status, and identify any issues that may impact the project's goals. The analyst must continuously learn and expand their knowledge in their area of specialization. Overall, the role requires a strong aptitude for problem-solving, attention to detail, and the ability to work in a dynamic environment. The Team: The Sourcing Automation Team specializes in automating content extraction from various sources including web, APIs, SFTPs, and the cloud. We then transform and process this content to ensure its value and deliverability to end users through end-to-end automation. Our team has a strong expertise in Robotic Process Automation (RPA) and utilizes technologies such as Python, AWS, Azure, SharePoint, APIs, Kafka, data lakes, and advanced Excel functionalities to enhance our automation solutions. Responsibilities and Impact: Work as part of an RPA development team to design, estimate, develop and implement software solutions that satisfy the business requirements Strong technical skill for data harvesting for multiple regions and multilingual websites with ease of maintenance Strong programming skills in Python selenium Automation preferably in Windows environment Conduct research and stay updated with the latest advancements in generative AI and automation technologies. Exposure in OOPS, Data Structures, Multi- threading, and Selenium tool Working knowledge of Configuration Management systems like GIT and build tools Should be able to work on a different technology or adaptive enough to learn new technologies as per project need What Were Looking For: Basic Required Qualifications: BE degree in Computer Science, related field, and 1 to 2 years of experience in programming skills Expertise with strong Python skills, AI, ML,Data harvesting, Data science, Data capture, NLP, Automation, JavaScript, Typescript, Html, JSON, OOPS, Data Structures, Multi- threading Experience with relational or SQL databases Experience with RESTful Web Services Additional Preferred Qualifications: Experience with C#, .NET Core, design patterns AI and ML Capable of performing tasks in a dynamic/changing environment Whats In It For You Our Purpose: Progress is not a self-starter. It requires a catalyst to be set in motion. Information, imagination, people, technologythe right combination can unlock possibility and change the world.Our world is in transition and getting more complex by the day. We push past expected observations and seek out new levels of understanding so that we can help companies, governments and individuals make an impact on tomorrow. At S&P Global we transform data into Essential Intelligence, pinpointing risks and opening possibilities. We Accelerate Progress. Our People: Our Values: Integrity, Discovery, Partnership At S&P Global, we focus on Powering Global Markets. Throughout our history, the world's leading organizations have relied on us for the Essential Intelligence they need to make confident decisions about the road ahead. We start with a foundation of integrity in all we do, bring a spirit of discovery to our work, and collaborate in close partnership with each other and our customers to achieve shared goals. Benefits: We take care of you, so you cantake care of business. We care about our people. Thats why we provide everything youand your careerneed to thrive at S&P Global. Health & WellnessHealth care coverage designed for the mind and body. Continuous LearningAccess a wealth of resources to grow your career and learn valuable new skills. Invest in Your FutureSecure your financial future through competitive pay, retirement planning, a continuing education program with a company-matched student loan contribution, and financial wellness programs. Family Friendly PerksIts not just about you. S&P Global has perks for your partners and little ones, too, with some best-in class benefits for families. Beyond the BasicsFrom retail discounts to referral incentive awardssmall perks can make a big difference. For more information on benefits by country visithttps://spgbenefits.com/benefit-summaries Global Hiring and Opportunity at S&P Global: At S&P Global, we are committed to fostering a connected andengaged workplace where all individuals have access to opportunities based on their skills, experience, and contributions. Our hiring practices emphasize fairness, transparency, and merit, ensuring that we attract and retain top talent. By valuing different perspectives and promoting a culture of respect and collaboration, we drive innovation and power global markets. Recruitment Fraud Alert If you receive an email from a spglobalind.com domain or any other regionally based domains, it is a scam and should be reported to reportfraud@spglobal.com. S&P Global never requires any candidate to pay money for job applications, interviews, offer letters, pre-employment training or for equipment/delivery of equipment. Stay informed and protect yourself from recruitment fraud by reviewing our guidelines, fraudulent domains, and how to report suspicious activity here. ----------------------------------------------------------- Equal Opportunity Employer S&P Global is an equal opportunity employer and all qualified candidates will receive consideration for employment without regard to race/ethnicity, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, marital status, military veteran status, unemployment status, or any other status protected by law. Only electronic job submissions will be considered for employment. If you need an accommodation during the application process due to a disability, please send an email to EEO.Compliance@spglobal.com and your request will be forwarded to the appropriate person. US Candidates Only The EEO is the Law Poster http://www.dol.gov/ofccp/regs/compliance/posters/pdf/eeopost.pdf describes discrimination protections under federal law. Pay Transparency Nondiscrimination Provision - https://www.dol.gov/sites/dolgov/files/ofccp/pdf/pay-transp_%20English_formattedESQA508c.pdf ----------------------------------------------------------- 20 - Professional (EEO-2 Job Categories-United States of America), IFTECH203 - Entry Professional (EEO Job Group), SWP Priority Ratings - (Strategic Workforce Planning)

Posted 1 day ago

Apply

3.0 - 4.0 years

8 - 13 Lacs

Noida, Gurugram

Work from Office

Naukri logo

R1 India is proud to be recognized amongst Top 25 Best Companies to Work For 2024, by the Great Place to Work Institute. This is our second consecutive recognition on this prestigious Best Workplaces list, building on the Top 50 recognition we achieved in 2023. Our focus on employee wellbeing and inclusion and diversity is demonstrated through prestigious recognitions with R1 India being ranked amongst Best in Healthcare, Top 100 Best Companies for Women by Avtar & Seramount, and amongst Top 10 Best Workplaces in Health & Wellness. We are committed to transform the healthcare industry with our innovative revenue cycle management services. Our goal is to make healthcare work better for all by enabling efficiency for healthcare systems, hospitals, and physician practices. With over 30,000 employees globally, we are about 16,000+ strong in India with presence in Delhi NCR, Hyderabad, Bangalore, and Chennai. Our inclusive culture ensures that every employee feels valued, respected, and appreciated with a robust set of employee benefits and engagement activities. Position Title Specialist Reports to Program Manager- Analytics BI Location Noida Position summary A user shall work with the development team and responsible for development task as individual contribution .He/she should be technical sound and able to communicate with client perfectly . Key duties & responsibilities Work as Specialist Data engineering project for E2E Analytics. Ensure Project delivery on time. Mentor other team mates and guide them. Will take the requirement from client and communicate as well. Ensure Timely documents creation for knowledge base, user guides, and other various communications systems. Ensures delivery against Business needs, team goals and objectives, i.e., meeting commitments and coordinating overall schedule. Works with large datasets in various formats, integrity/QA checks, and reconciliation for accounting systems. Leads efforts to troubleshoot and solve process or system related issues. Understand, support, enforce and comply with company policies, procedures and Standards of Business Ethics and Conduct. Experience working with Agile methodology Experience, Skills and Knowledge: Bachelors degree in computer science or equivalent experience is required. B.Tech/MCA preferable. Minimum 3 4 years experience. Excellent communications and strong commitment for delivering the highest level of service Technical Skills Expert knowledge and experience working with Spark, Scala Experience in Azure data Factory ,Azure Data bricks, data Lake Experience working with SQL and Snowflake Experience with data integration tools such as SSIS, ADF Experience with programming languages such as Python Expert in Astronomer Airflow. Experience with programming languages such as Python, Spark, Scala Experience or exposure on Microsoft Azure Data Fundamentals Key competency profile Own your development by implementing and sharing your learnings Motivate each other to perform at our highest level Work the right way by acting with integrity and living our values every day Succeed by proactively identifying problems and solutions for yourself and others. Communicate effectively if there any challenge. Accountability and Responsibility should be there. Working in an evolving healthcare setting, we use our shared expertise to deliver innovative solutions. Our fast-growing team has opportunities to learn and grow through rewarding interactions, collaboration and the freedom to explore professional interests. Our associates are given valuable opportunities to contribute, to innovate and create meaningful work that makes an impact in the communities we serve around the world. We also offer a culture of excellence that drives customer success and improves patient care. We believe in giving back to the community and offer a competitive benefits package. To learn more, visitr1rcm.com Visit us on Facebook

Posted 1 day ago

Apply

5.0 - 7.0 years

9 - 14 Lacs

Noida, Gurugram

Work from Office

Naukri logo

R1 India is proud to be recognized amongst Top 25 Best Companies to Work For 2024, by the Great Place to Work Institute. This is our second consecutive recognition on this prestigious Best Workplaces list, building on the Top 50 recognition we achieved in 2023. Our focus on employee wellbeing and inclusion and diversity is demonstrated through prestigious recognitions with R1 India being ranked amongst Best in Healthcare, Top 100 Best Companies for Women by Avtar & Seramount, and amongst Top 10 Best Workplaces in Health & Wellness. We are committed to transform the healthcare industry with our innovative revenue cycle management services. Our goal is to make healthcare work better for all by enabling efficiency for healthcare systems, hospitals, and physician practices. With over 30,000 employees globally, we are about 16,000+ strong in India with presence in Delhi NCR, Hyderabad, Bangalore, and Chennai. Our inclusive culture ensures that every employee feels valued, respected, and appreciated with a robust set of employee benefits and engagement activities. Position Title Senior Specialist Reports to Program Manager- Analytics BI Position summary: A Specialist shall work with the development team and responsible for development task as individual contribution. He/she should be able to mentor team and able to help in resolving issues. He/she should be technical sound and able to communicate with client perfectly. Key duties & responsibilities Work as Lead Developer Data engineering project for E2E Analytics. Ensure Project delivery on time. Mentor other team mates and guide them. Will take the requirement from client and communicate as well. Ensure Timely documents creation for knowledge base, user guides, and other various communications systems. Ensures delivery against Business needs, team goals and objectives, i.e., meeting commitments and coordinating overall schedule. Works with large datasets in various formats, integrity/QA checks, and reconciliation for accounting systems. Leads efforts to troubleshoot and solve process or system related issues. Understand, support, enforce and comply with company policies, procedures and Standards of Business Ethics and Conduct. Experience working with Agile methodology Experience, Skills and Knowledge: Bachelors degree in Computer Science or equivalent experience is required. B.Tech/MCA preferable. Minimum 5 7 years experience. Excellent communications and strong commitment for delivering the highest level of service Technical Skills Expert knowledge and experience working with Spark, Scala Experience in Azure data Factory ,Azure Data bricks, data Lake Experience working with SQL and Snowflake Experience with data integration tools such as SSIS, ADF Experience with programming languages such as Python Expert in Astronomer Airflow. Experience with programming languages such as Python, Spark, Scala Experience or exposure on Microsoft Azure Data Fundamentals Key competency profile: Own youre a development by implementing and sharing your learnings Motivate each other to perform at our highest level Work the right way by acting with integrity and living our values every day Succeed by proactively identifying problems and solutions for yourself and others. Communicate effectively if there any challenge. Accountability and Responsibility should be there. Working in an evolving healthcare setting, we use our shared expertise to deliver innovative solutions. Our fast-growing team has opportunities to learn and grow through rewarding interactions, collaboration and the freedom to explore professional interests. Our associates are given valuable opportunities to contribute, to innovate and create meaningful work that makes an impact in the communities we serve around the world. We also offer a culture of excellence that drives customer success and improves patient care. We believe in giving back to the community and offer a competitive benefits package. To learn more, visitr1rcm.com Visit us on Facebook

Posted 1 day ago

Apply

3.0 - 8.0 years

11 - 16 Lacs

Bengaluru

Work from Office

Naukri logo

As a Data Engineer , you are required to Design, build, and maintain data pipelines that efficiently process and transport data from various sources to storage systems or processing environments while ensuring data integrity, consistency, and accuracy across the entire data pipeline. Integrate data from different systems, often involving data cleaning, transformation (ETL), and validation. Design the structure of databases and data storage systems, including the design of schemas, tables, and relationships between datasets to enable efficient querying. Work closely with data scientists, analysts, and other stakeholders to understand their data needs and ensure that the data is structured in a way that makes it accessible and usable. Stay up-to-date with the latest trends and technologies in the data engineering space, such as new data storage solutions, processing frameworks, and cloud technologies. Evaluate and implement new tools to improve data engineering processes. Qualification Bachelor's or Master's in Computer Science & Engineering, or equivalent. Professional Degree in Data Science, Engineering is desirable. Experience level At least3- 5years hands-on experience in Data Engineering Desired Knowledge & Experience Spark: Spark 3.x, RDD/DataFrames/SQL, Batch/Structured Streaming Knowing Spark internalsCatalyst/Tungsten/Photon Databricks: Workflows, SQL Warehouses/Endpoints, DLT, Pipelines, Unity, Autoloader IDE: IntelliJ/Pycharm, Git, Azure Devops, Github Copilot Test: pytest, Great Expectations CI/CD Yaml Azure Pipelines, Continuous Delivery, Acceptance Testing Big Data Design: Lakehouse/Medallion Architecture, Parquet/Delta, Partitioning, Distribution, Data Skew, Compaction Languages: Python/Functional Programming (FP) SQL TSQL/Spark SQL/HiveQL Storage Data Lake and Big Data Storage Design additionally it is helpful to know basics of: Data Pipelines ADF/Synapse Pipelines/Oozie/Airflow Languages: Scala, Java NoSQL :Cosmos, Mongo, Cassandra Cubes SSAS (ROLAP, HOLAP, MOLAP), AAS, Tabular Model SQL Server TSQL, Stored Procedures Hadoop HDInsight/MapReduce/HDFS/YARN/Oozie/Hive/HBase/Ambari/Ranger/Atlas/Kafka Data Catalog Azure Purview, Apache Atlas, Informatica Required Soft skills & Other Capabilities Great attention to detail and good analytical abilities. Good planning and organizational skills Collaborative approach to sharing ideas and finding solutions Ability to work independently and also in a global team environment.

Posted 1 day ago

Apply

7.0 - 12.0 years

14 - 18 Lacs

Noida

Work from Office

Naukri logo

Who We Are Build a brighter future while learning and growing with a Siemens company at the intersection of technology, community and s ustainability. Our global team of innovators is always looking to create meaningful solutions to some of the toughest challenges facing our world. Find out how far your passion can take you. What you need * BS in an Engineering or Science discipline, or equivalent experience * 7+ years of software/data engineering experience using Java, Scala, and/or Python, with at least 5 years' experience in a data focused role * Experience in data integration (ETL/ELT) development using multiple languages (e.g., Java, Scala, Python, PySpark, SparkSQL) * Experience building and maintaining data pipelines supporting a variety of integration patterns (batch, replication/CD C, event streaming) and data lake/warehouse in production environments * Experience with AWS-based data services technologies (e.g., Kinesis, Glue, RDS, Athena, etc.) and Snowflake CDW * Experience of working in the larger initiatives building and rationalizing large scale data environments with a large variety of data pipelines, possibly with internal and external partner integrations, would be a plus * Willingness to experiment and learn new approaches and technology applications * Knowledge and experience with various relational databases and demonstrable proficiency in SQL and supporting analytics uses and users * Knowledge of software engineering and agile development best practices * Excellent written and verbal communication skills The Brightly culture Were guided by a vision of community that serves the ambitions and wellbeing of all people, and our professional communities are no exception. We model that ideal every day by being supportive, collaborative partners to one another, conscientiousl y making space for our colleagues to grow and thrive. Our passionate team is driven to create a future where smarter infrastructure protects the environments that shape and connect us all. That brighter future starts with us.

Posted 1 day ago

Apply

8.0 - 13.0 years

8 - 12 Lacs

Bengaluru

Work from Office

Naukri logo

Hello Talented Techie! We provide support in Project Services and Transformation, Digital Solutions and Delivery Management. We offer joint operations and digitalization services for Global Business Services and work closely alongside the entire Shared Services organization. We make optimal use of the possibilities of new technologies such as Business Process Management (BPM) and Robotics as enablers for efficient and effective processes. We are looking for Sr. AWS Cloud Architect Architect and Design Develop scalable and efficient data solutions using AWS services such as AWS Glue, Amazon Redshift, S3, Kinesis(Apache Kafka), DynamoDB, Lambda, AWS Glue(Streaming ETL) and EMR Integration Integrate real-time data from various Siemens organizations into our data lake, ensuring seamless data flow and processing. Data Lake Management Design and manage a large-scale data lake using AWS services like S3, Glue, and Lake Formation. Data Transformation Apply various data transformations to prepare data for analysis and reporting, ensuring data quality and consistency. Snowflake Integration Implement and manage data pipelines to load data into Snowflake, utilizing Iceberg tables for optimal performance and flexibility. Performance Optimization Optimize data processing pipelines for performance, scalability, and cost-efficiency. Security and Compliance Ensure that all solutions adhere to security best practices and compliance requirements. Collaboration Work closely with cross-functional teams, including data engineers, data scientists, and application developers, to deliver end-to-end solutions. Monitoring and Troubleshooting Implement monitoring solutions to ensure the reliability and performance of data pipelines. Troubleshoot and resolve any issues that arise. Youd describe yourself as: Experience 8+ years of experience in data engineering or cloud solutioning, with a focus on AWS services. Technical Skills Proficiency in AWS services such as AWS API, AWS Glue, Amazon Redshift, S3, Apache Kafka and Lake Formation. Experience with real-time data processing and streaming architectures. Big Data Querying Tools: Strong knowledge of big data querying tools (e.g., Hive, PySpark). Programming Strong programming skills in languages such as Python, Java, or Scala for building and maintaining scalable systems. Problem-Solving Excellent problem-solving skills and the ability to troubleshoot complex issues. Communication Strong communication skills, with the ability to work effectively with both technical and non-technical stakeholders. Certifications AWS certifications are a plus. Create a better #TomorrowWithUs! This role, based in Bangalore, is an individual contributor position. You may be required to visit other locations within India and internationally. In return, you'll have the opportunity to work with teams shaping the future. At Siemens, we are a collection of over 312,000 minds building the future, one day at a time, worldwide. Find out more about Siemens careers at

Posted 1 day ago

Apply

15.0 - 20.0 years

5 - 9 Lacs

Bengaluru

Work from Office

Naukri logo

Project Role : Application Developer Project Role Description : Design, build and configure applications to meet business process and application requirements. Must have skills : AWS Glue Good to have skills : Microsoft SQL Server, Python (Programming Language), Data EngineeringMinimum 5 year(s) of experience is required Educational Qualification : 15 years full time education:Developing a customer insights platform that will provide an ID graph and Digital customer view to help drive improvements in marketing decisions. Responsibilities:Design, build, and maintain data pipelines using AWS services (Glue, Neptune, S3).Participate in code reviews, testing, and optimization of data pipelines.Collaborate with stakeholders to understand data requirements and translate into technical solutions.:Proven experience as a Senior Data Engineer / Data Architect, or similar role.Knowledge of data governance and security practices.Extensive experience with data lake technologies (NiFi, Spark, Hive Metastore, Object Storage, Delta Lake Framework)Extensive experience with AWS cloud services, including AWS Glue, Neptune, S3 and LambdaExperience with AWS Neptune or other graph database technologies.Experience in data modelling and design.Experience with event driven architectureExperience with PythonExperience with SQLStrong problem-solving skills and attention to detail.Excellent communication and teamwork skills. Nice to have:Experience with observability solutions (Splunk, New Relic)Experience with Infrastructure as Code (Terraform, CloudFormation)Experience with CICD (Jenkins)Experience with KubernetesFamiliarity with data visualization tools.Support Engineer Similar skills as the above, but with more of a support focus, able to troubleshoot, patch and upgrade, minor enhancements and fixes to the infrastructure and pipelines. Experience with observability, cloudwatch, new relic and monitoring. Qualification 15 years full time education

Posted 1 day ago

Apply

7.0 - 12.0 years

4 - 8 Lacs

Bengaluru

Work from Office

Naukri logo

About the Role We are seeking a highly skilled Data Engineer with deep expertise in PySpark and the Cloudera Data Platform (CDP) to join our data engineering team. As a Data Engineer, you will be responsible for designing, developing, and maintaining scalable data pipelines that ensure high data quality and availability across the organization. This role requires a strong background in big data ecosystems, cloud-native tools, and advanced data processing techniques. The ideal candidate has hands-on experience with data ingestion, transformation, and optimization on the Cloudera Data Platform, along with a proven track record of implementing data engineering best practices. You will work closely with other data engineers to build solutions that drive impactful business insights. Responsibilities Data Pipeline DevelopmentDesign, develop, and maintain highly scalable and optimized ETL pipelines using PySpark on the Cloudera Data Platform, ensuring data integrity and accuracy. Data IngestionImplement and manage data ingestion processes from a variety of sources (e.g., relational databases, APIs, file systems) to the data lake or data warehouse on CDP. Data Transformation and ProcessingUse PySpark to process, cleanse, and transform large datasets into meaningful formats that support analytical needs and business requirements. Performance OptimizationConduct performance tuning of PySpark code and Cloudera components, optimizing resource utilization and reducing runtime of ETL processes. Data Quality and ValidationImplement data quality checks, monitoring, and validation routines to ensure data accuracy and reliability throughout the pipeline. Automation and OrchestrationAutomate data workflows using tools like Apache Oozie, Airflow, or similar orchestration tools within the Cloudera ecosystem. Education and Experience Bachelors or Masters degree in Computer Science, Data Engineering, Information Systems, or a related field. 3+ years of experience as a Data Engineer, with a strong focus on PySpark and the Cloudera Data Platform. Technical Skills PySparkAdvanced proficiency in PySpark, including working with RDDs, DataFrames, and optimization techniques. Cloudera Data PlatformStrong experience with Cloudera Data Platform (CDP) components, including Cloudera Manager, Hive, Impala, HDFS, and HBase. Data WarehousingKnowledge of data warehousing concepts, ETL best practices, and experience with SQL-based tools (e.g., Hive, Impala). Big Data TechnologiesFamiliarity with Hadoop, Kafka, and other distributed computing tools. Orchestration and SchedulingExperience with Apache Oozie, Airflow, or similar orchestration frameworks. Scripting and AutomationStrong scripting skills in Linux.

Posted 1 day ago

Apply

9.0 - 14.0 years

5 - 8 Lacs

Bengaluru

Work from Office

Naukri logo

Kafka Data Engineer Data Engineer to build and manage data pipelines that support batch and streaming data solutions. The role requires expertise in creating seamless data flows across platforms like Data Lake/Lakehouse in Cloudera, Azure Databricks, Kafka for both batch and stream data pipelines etc. Responsibilities Strong experience in develop, test, and maintain data pipelines (batch & stream) using Cloudera, Spark, Kafka and Azure services like ADF, Cosmos DB, Databricks, NoSQL DB/ Mongo DB etc. Strong programming skills in spark, python or scala & SQL. Optimize data pipelines to improve speed, performance, and reliability, ensuring that data is available for data consumers as required. Create ETL pipelines for downstream consumers by transform data as per business logic. Work closely with Data Architects and Data Analysts to align data solutions with business needs and ensure the accuracy and accessibility of data. Implement data validation checks and error handling processes to maintain high data quality and consistency across data pipelines. Strong analytical and problem solving skills, with a focus on optimizing data flows and addressing impacts in the data pipeline. Qualifications 8+ years of IT experience with at least 5+ years in data engineering and cloud-based data platforms. Strong experience with Cloudera/any Data Lake, Confluent/Apache Kafka, and Azure Data Services (ADF, Databricks, Cosmos DB). Deep knowledge of NoSQL databases (Cosmos DB, MongoDB) and data modeling for performance and scalability. Proven expertise in designing and implementing batch and streaming data pipelines using Databricks, Spark, or Kafka. Experience in creating scalable, reliable, and high-performance data solutions with robust data governance policies. Strong collaboration skills to work with stakeholders, mentor junior Data Engineers, and translate business needs into actionable solutions. Bachelors or masters degree in computer science, IT, or a related field.

Posted 1 day ago

Apply

8.0 - 13.0 years

5 - 10 Lacs

Hyderabad

Work from Office

Naukri logo

6+ years of experience with Java Spark. Strong understanding of distributed computing, big data principles, and batch/stream processing. Proficiency in working with AWS services such as S3, EMR, Glue, Lambda, and Athena. Experience with Data Lake architectures and handling large volumes of structured and unstructured data. Familiarity with various data formats. Strong problem-solving and analytical skills. Excellent communication and collaboration abilities. Design, develop, and optimize large-scale data processing pipelines using Java Spark Build scalable solutions to manage data ingestion, transformation, and storage in AWS-based Data Lake environments. Collaborate with data architects and analysts to implement data models and workflows aligned with business requirements. Ensure performance tuning, fault tolerance, and reliability of distributed data processing systems.

Posted 1 day ago

Apply

8.0 - 13.0 years

5 - 10 Lacs

Mumbai

Work from Office

Naukri logo

Sr Developer with special emphasis and experience of 8 to 10 years on Python and Pyspark along with hands on experience on AWS Data components like AWS Glue, Athena etc.,. Also have good knowledge on Data ware house tools to understand the existing system. Candidate should also have experience on Datalake, Teradata and Snowflake. Should be good at terraform. 8-10 years of experience in designing and developing Python and Pyspark applications Creating or maintaining data lake solutions using Snowflake,taradata and other dataware house tools. Should have good knowledge and hands on experience on AWS Glue , Athena etc., Sound Knowledge on all Data lake concepts and able to work on data migration projects. Providing ongoing support and maintenance for applications, including troubleshooting and resolving issues. Expertise in practices like Agile, Peer reviews and CICD Pipelines.

Posted 1 day ago

Apply

8.0 - 13.0 years

5 - 10 Lacs

Hyderabad

Work from Office

Naukri logo

Sr Developer with special emphasis and experience of 8 to 10 years on Python and Pyspark along with hands on experience on AWS Data components like AWS Glue, Athena etc.,. Also have good knowledge on Data ware house tools to understand the existing system. Candidate should also have experience on Datalake, Teradata and Snowflake. Should be good at terraform. 8-10 years of experience in designing and developing Python and Pyspark applications Creating or maintaining data lake solutions using Snowflake,taradata and other dataware house tools. Should have good knowledge and hands on experience on AWS Glue , Athena etc., Sound Knowledge on all Data lake concepts and able to work on data migration projects. Providing ongoing support and maintenance for applications, including troubleshooting and resolving issues. Expertise in practices like Agile, Peer reviews and CICD Pipelines.

Posted 1 day ago

Apply

8.0 - 13.0 years

5 - 9 Lacs

Pune

Work from Office

Naukri logo

Responsibilities / Qualifications: Candidate must have 5-6 years of IT working experience with at least 3 years of experience on AWS Cloud environment is preferred Ability to understand the existing system architecture and work towards the target architecture. Experience with data profiling activities, discover data quality challenges and document it. Experience with development and implementation of large-scale Data Lake and data analytics platform with AWS Cloud platform. Develop and unit test Data pipeline architecture for data ingestion processes using AWS native services. Experience with development on AWS Cloud using AWS data stores such as Redshift, RDS, S3, Glue Data Catalog, Lake formation, Apache Airflow, Lambda, etc Experience with development of data governance framework including the management of data, operating model, data policies and standards. Experience with orchestration of workflows in an enterprise environment. Working experience with Agile Methodology Experience working with source code management tools such as AWS Code Commit or GitHub Experience working with Jenkins or any CI/CD Pipelines using AWS Services Experience working with an on-shore / off-shore model and collaboratively work on deliverables. Good communication skills to interact with onshore team.

Posted 1 day ago

Apply

12.0 - 17.0 years

13 - 18 Lacs

Hyderabad

Work from Office

Naukri logo

1. Data Engineer Azure Data Services 2. Data Modelling NO SQL and SQL 3. Good understanding of Spark, park stream 4. Hands on with Python Pandas / Data Factory / Cosmos DB / Data Bricks / Event Hubs / Stream Analytics 5. Knowledge of medallion architecture, data vaults, data marts etc. 6. Preferably Azure Data associate exam certified.

Posted 1 day ago

Apply

8.0 - 13.0 years

5 - 10 Lacs

Bengaluru

Work from Office

Naukri logo

6+ years of experience with Java Spark. Strong understanding of distributed computing, big data principles, and batch/stream processing. Proficiency in working with AWS services such as S3, EMR, Glue, Lambda, and Athena. Experience with Data Lake architectures and handling large volumes of structured and unstructured data. Familiarity with various data formats. Strong problem-solving and analytical skills. Excellent communication and collaboration abilities. Design, develop, and optimize large-scale data processing pipelines using Java Spark Build scalable solutions to manage data ingestion, transformation, and storage in AWS-based Data Lake environments. Collaborate with data architects and analysts to implement data models and workflows aligned with business requirements. Ensure performance tuning, fault tolerance, and reliability of distributed data processing systems.

Posted 1 day ago

Apply

8.0 - 13.0 years

8 - 12 Lacs

Hyderabad

Work from Office

Naukri logo

10+ years of experience with Java Spark. Strong understanding of distributed computing, big data principles, and batch/stream processing. Proficiency in working with AWS services such as S3, EMR, Glue, Lambda, and Athena. Experience with Data Lake architectures and handling large volumes of structured and unstructured data. Familiarity with various data formats. Strong problem-solving and analytical skills. Excellent communication and collaboration abilities. Design, develop, and optimize large-scale data processing pipelines using Java Spark Build scalable solutions to manage data ingestion, transformation, and storage in AWS-based Data Lake environments. Collaborate with data architects and analysts to implement data models and workflows aligned with business requirements.

Posted 1 day ago

Apply

6.0 - 11.0 years

4 - 8 Lacs

Hyderabad

Work from Office

Naukri logo

Sr Developer with special emphasis and experience of 8 to 10 years on Python and Pyspark along with hands on experience on AWS Data components like AWS Glue, Athena etc.,. Also have good knowledge on Data ware house tools to understand the existing system. Candidate should also have experience on Datalake, Teradata and Snowflake. Should be good at terraform. 8-10 years of experience in designing and developing Python and Pyspark applications Creating or maintaining data lake solutions using Snowflake,taradata and other dataware house tools. Should have good knowledge and hands on experience on AWS Glue , Athena etc., Sound Knowledge on all Data lake concepts and able to work on data migration projects. Providing ongoing support and maintenance for applications, including troubleshooting and resolving issues. Expertise in practices like Agile, Peer reviews and CICD Pipelines.

Posted 1 day ago

Apply

2.0 - 5.0 years

4 - 8 Lacs

Bengaluru

Work from Office

Naukri logo

Seeking a skilled Data Engineer to work on cloud-based data pipelines and analytics platforms. The ideal candidate will have hands-on experience in PySpark and AWS, with proficiency in designing Data Lakes and working with modern data orchestration tools. Data Engineer to work on cloud-based data pipelines and analytics platforms PySpark and AWS, with proficiency in designing Data Lakes working with modern data orchestration tools

Posted 1 day ago

Apply

8.0 - 13.0 years

4 - 8 Lacs

Mumbai

Work from Office

Naukri logo

Sr Devloper with special emphasis and experience of 8 to 10 years on Python and Pyspark along with hands on experience on AWS Data components like AWS Glue, Athena etc.,. Also have good knowledge on Data ware house tools to understand the existing system. Candidate should also have experience on Datalake, Teradata and Snowflake. Should be good at terraform. 8-10 years of experience in designing and developing Python and Pyspark applications Creating or maintaining data lake solutions using Snowflake,taradata and other dataware house tools. Should have good knowledge and hands on experience on AWS Glue , Athena etc., Sound Knowledge on all Data lake concepts and able to work on data migration projects. Providing ongoing support and maintenance for applications, including troubleshooting and resolving issues. Expertise in practices like Agile, Peer reviews and CICD Pipelines.

Posted 1 day ago

Apply

8.0 - 13.0 years

10 - 15 Lacs

Hyderabad

Work from Office

Naukri logo

1. 6+ years of experience as a DevOps/Build & Release Engineer framework in application configurations, code compilation, packaging, building, managing and releasing code from one environment to another environment. 2. Proficient with container systems like Docker and container orchestration like EC2 Container Service, Kubernetes and Managed Docker orchestration and Docker containerization using Kubernetes. 3. Experienced working as a DevOps Engineer on various technologies/applications like SVN, GIT, Ant, Maven, Artifactory, Jenkins, Open Shift Containers(OCP), Chef, Docker, Kubernetes, Azure cloud, App services, Function Apps, Storage accounts, Data Lake, Event Hubs, Event Grids, Azure DevOps, and AWS DevOps. 4. Experienced in version control tools like TFS, VSTS, SVN, Bitbucket and GITHUB. 5. Extensive experience using MAVEN, ANT as build tools for the building of deployable Artifacts from source code 6. Experience worked on Groovy, Jenkins, Azure DevOps, AWS DevOps, Build Pipelines, Release pipelines for continuous integration and for End-to-End automation for all build and deployments. Version Control SystemsGIT, Subversion (SVN) Operating SystemsRHEL Linux CI/CD ToolsJenkins, Artifactory, Nexus, Azure DevOps, kafka ContainersDocker, Kubernetes, Packer, OCP(Open shift container) Scripting LanguagesJava scripting, Unix Shell scripting, PowerShell Web ServersTomcat, Apache, IIS, JBOSS, Spring boot Cloud PlatformsAzure, AWS DatabasesOracle, Mongo DB, Couchbase Project Management ToolsMS Office, MS Project Bug Tracking Tools JIRA

Posted 1 day ago

Apply

5.0 - 10.0 years

4 - 8 Lacs

Hyderabad

Work from Office

Naukri logo

Seeking a skilled Data Engineer to work on cloud-based data pipelines and analytics platforms. The ideal candidate will have hands-on experience in PySpark and AWS, with proficiency in designing Data Lakes and working with modern data orchestration tools. Data Engineer to work on cloud-based data pipelines and analytics platforms PySpark and AWS, with proficiency in designing Data Lakes working with modern data orchestration tools

Posted 1 day ago

Apply

8.0 - 12.0 years

10 - 20 Lacs

Mumbai, Pune

Hybrid

Naukri logo

Role: Cloud Data Architect Experience: 8yrs to 12yrs Location: Mumbai/Pune As a Cloud Data Architect, you will design, implement, and evangelize scalable, secure data architectures in a cloud environment. In addition to driving technical excellence in client delivery, you will collaborate with our sales and pre-sales teams to develop reusable assets, accelerators, and artifacts that support RFP responses and help win new business. Role Summary • Architect, design, and deploy end-to-end cloud data solutions while also serving as a technical advisor in sales engagements. • Create accelerators, solution blueprints, and artifacts that can be leveraged for client proposals and RFP responses. • Collaborate across multiple teams to ensure seamless integration of technical solutions with client delivery and business goals. Key Skills / Technologies • Must-Have: o Cloud Platforms (AWS, Azure, or Google Cloud) o Data Warehousing & Data Lakes (Redshift, BigQuery, Snowflake, etc.) o Big Data Technologies (Hadoop, Spark, Kafka) o SQL & NoSQL databases o ETL/ELT tools and pipelines o Data Modeling & Architecture design • Good-to-Have: o Infrastructure-as-Code (Terraform, CloudFormation) o Containerization & orchestration (Docker, Kubernetes) o Programming languages (Python, Java, Scala) o Data Governance and Security best practices Responsibilities • Technical Architecture & Delivery: o Design and build cloud-based data platforms, including data lakes, data warehouses, and real-time data pipelines. o Ensure data quality, consistency, and security across all systems. o Work closely with cross-functional teams to integrate diverse data sources into a cohesive architecture. • Sales & Pre-Sales Support: o Develop technical assets, accelerators, and reference architectures that support RFP responses and sales proposals. o Collaborate with sales teams to articulate technical solutions and demonstrate value to prospective clients. o Present technical roadmaps and participate in client meetings to support business development efforts. • Cross-Functional Collaboration: o Serve as a liaison between technical delivery teams and sales teams, ensuring alignment of strategies and seamless client hand-offs. o Mentor team members and lead technical discussions on architecture best practices. Required Qualifications • Bachelors or Masters degree in Computer Science, Information Systems, or a related field. • 5+ years of experience in data architecture/design within cloud environments, with a track record of supporting pre-sales initiatives. • Proven experience with cloud data platforms and big data technologies. • Strong analytical, problem-solving, and communication skills. Why Join Us • Influence both our technical roadmap and sales strategy by shaping cuttingedge, cloud-first data solutions. • Work with a dynamic, multi-disciplinary team and gain exposure to high-profile client engagements. • Enjoy a culture of innovation, professional growth, and collaboration with competitive compensation and benefits.

Posted 1 day ago

Apply

5.0 - 10.0 years

7 - 12 Lacs

Mumbai

Work from Office

Naukri logo

The candidate must possess knowledge relevant to the functional area, and act as a subject matter expert in providing advice in the area of expertise, and focus on continuous improvement for maximum efficiency. It is vital to focus on the high standard of delivery excellence, provide top-notch service quality and develop successful long-term business partnerships with internal/external customers by identifying and fulfilling customer needs. He/she should be able to break down complex problems into logical and manageable parts in a systematic way, and generate and compare multiple options, and set priorities to resolve problems.The ideal candidate must be proactive, and go beyond expectations to achieve job results and create new opportunities. He/she must positively influence the team, motivate high performance, promote a friendly climate, give constructive feedback, provide development opportunities, and manage career aspirations of direct reports. Communication skills are key here, to explain organizational objectives, assignments, and the big picture to the team, and to articulate team vision and clear objectives. Senior Process Manager Roles and responsibilities: Collaborate with stakeholders to gather and analyze business requirements. Utilize data skills to extract, transform, and analyze data from various sources. Interpret data to identify trends, patterns, and insights. Generate comprehensive reports to present findings to stakeholders. Document business processes, data flows, and requirements. Assist in the development and implementation of data-driven solutions. Conduct ad-hoc analysis as required to support business initiatives Technical and Functional Skills: Bachelors Degree with 5+ years of experience with 3+ years of hands-on experience as a Business Analyst or similar role. Strong data skills with the ability to manipulate and analyze complex datasets. Proficiency in interpreting data and translating findings into actionable insights. Experience with report generation and data visualization tools. Solid understanding of business processes and data flows. Excellent communication and presentation skills. Ability to work independently and collaboratively in a team environment. Basic understanding of Google Cloud Platform (GCP), Tableau, SQL, and Python is a plus. Certification in Business Analysis or related field. Familiarity with Google Cloud Platform (GCP) services and tools. Experience with Tableau for data visualization. Proficiency in SQL for data querying and manipulation. Basic knowledge of Python for data analysis and automation.

Posted 1 day ago

Apply

Exploring Data Lake Jobs in India

The data lake job market in India is experiencing significant growth as organizations continue to invest in big data technologies to drive business insights and decision-making. Data lake professionals are in high demand across various industries, offering lucrative career opportunities for job seekers with relevant skills and experience.

Top Hiring Locations in India

  1. Bangalore
  2. Mumbai
  3. Pune
  4. Hyderabad
  5. Delhi/NCR

Average Salary Range

The average salary range for data lake professionals in India varies based on experience levels. Entry-level positions may start at around INR 4-6 lakhs per annum, while experienced professionals can earn upwards of INR 12-15 lakhs per annum.

Career Path

Typically, a career in data lake progresses from roles such as Data Engineer or Data Analyst to Senior Data Engineer, Data Architect, and eventually to a Data Science Manager or Chief Data Officer. Advancement in this field is often based on gaining experience working with large datasets, implementing data management best practices, and demonstrating strong problem-solving skills.

Related Skills

In addition to expertise in data lake technologies like Apache Hadoop, Apache Spark, and AWS S3, data lake professionals are often expected to have skills in data modeling, data warehousing, SQL, programming languages like Python or Java, and experience with ETL (Extract, Transform, Load) processes.

Interview Questions

  • What is a data lake and how does it differ from a data warehouse? (basic)
  • Explain the components of Hadoop ecosystem and their roles in data processing. (medium)
  • How do you ensure data quality and consistency in a data lake environment? (medium)
  • What are the key challenges of managing metadata in a data lake? (advanced)
  • Can you explain how data partitioning works in Apache Spark? (medium)
  • What are the best practices for optimizing data storage in a data lake? (advanced)
  • Describe a complex data transformation process you implemented in a data lake project. (medium)
  • How do you handle data security and access control in a data lake architecture? (medium)
  • What are the benefits of using columnar storage in a data lake? (basic)
  • Explain the concept of data lineage and its importance in data lake management. (medium)
  • How do you handle schema evolution in a data lake environment? (advanced)
  • What are the differences between batch processing and real-time processing in a data lake? (basic)
  • Can you discuss the role of Apache Hive in data lake analytics? (medium)
  • How do you monitor and troubleshoot performance issues in a data lake cluster? (advanced)
  • What are the key considerations for designing a scalable data lake architecture? (medium)
  • Explain the concept of data lake governance and its impact on data management. (medium)
  • How do you optimize data ingestion processes in a data lake to handle large volumes of data? (medium)
  • Describe a scenario where you had to deal with data quality issues in a data lake project. How did you resolve it? (medium)
  • What are the best practices for data lake security in a cloud environment? (advanced)
  • Can you explain the concept of data catalog and its role in data lake management? (medium)
  • How do you ensure data privacy compliance in a data lake architecture? (medium)
  • What are the advantages of using Apache Flink for real-time data processing in a data lake? (advanced)
  • Describe a successful data lake implementation project you were involved in. What were the key challenges and how did you overcome them? (medium)
  • How do you handle data retention policies in a data lake to ensure data governance and compliance? (medium)
  • What are the key considerations for disaster recovery planning in a data lake environment? (advanced)

Closing Remark

As the demand for data lake professionals continues to rise in India, job seekers should focus on honing their skills in big data technologies and data management practices to stand out in the competitive job market. Prepare thoroughly for interviews by mastering both technical and conceptual aspects of data lake architecture and be confident in showcasing your expertise to potential employers. Good luck in your job search!

cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies