Home
Jobs

41 Parquet Jobs

Filter
Filter Interviews
Min: 0 years
Max: 25 years
Min: ₹0
Max: ₹10000000
Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

15.0 - 20.0 years

10 - 14 Lacs

Bengaluru

Work from Office

Naukri logo

Project Role : Application Lead Project Role Description : Lead the effort to design, build and configure applications, acting as the primary point of contact. Must have skills : Palantir Foundry Good to have skills : NAMinimum 5 year(s) of experience is required Educational Qualification : 15 years full time education Project Role :Lead Data Engineer Project Role Description :Design, build and enhance applications to meet business process and requirements in Palantir foundry.Work experience :Minimum 6 years Must have Skills :Palantir Foundry , PySpark, TypeScript (for customizing Workshop Forms & UI)Good to Have Skills :Experience in Pyspark, python and SQLKnowledge on Big Data tools & TechnologiesOrganizational and project management experience.Job & Key Responsibilities :Responsible for designing , developing, testing, and supporting data pipelines and applications on Palantir foundry.Configure and customize Workshop to design and implement workflows and ontologies.Configure and customize Workshop applications, including designing Forms, Workflows, and Ontology-based interactions.Write TypeScript to create dynamic and interactive Forms in Workshop for user-driven data entry and validation.Collaborate with data engineers and stakeholders to ensure successful deployment and operation of Palantir foundry applications.Work with stakeholders including the product owner, data, and design teams to assist with data-related technical issues and understand the requirements and design the data pipeline.Work independently, troubleshoot issues and optimize performance.Communicate design processes, ideas, and solutions clearly and effectively to team and client. Assist junior team members in improving efficiency and productivity. Technical Experience :Proficiency in PySpark, Python and Sql with demonstrable ability to write & optimize SQL and spark jobs.Hands on experience on Palantir foundry related services like Data Connection, Code repository, Contour , Data lineage & Health checks.Good to have working experience with workshop , ontology , slate.Hands-on experience in data engineering and building data pipelines (Code/No Code) for ELT/ETL data migration, data refinement and data quality checks on Palantir Foundry.Experience in TypeScript to create and customize Forms in Workshop, including form validation, user interactions, and data binding with Ontology.Experience in ingesting data from different external source systems using data connections and sync.Good Knowledge on Spark Architecture and hands on experience on performance tuning & code optimization.Proficient in managing both structured and unstructured data, with expertise in handling various file formats such as CSV, JSON, Parquet, and ORC.Experience in developing and managing scalable architecture & managing large data sets.Good understanding of data loading mechanism and adeptly implement strategies for capturing CDC.Nice to have test driven development and CI/CD workflows.Experience in version control software such as Git and working with major hosting services (e. g. Azure DevOps, GitHub, Bitbucket, Gitlab).Implementing code best practices involves adhering to guidelines that enhance code readability, maintainability, and overall quality. Educational Qualification:15 years of full-term education Qualification 15 years full time education

Posted 4 days ago

Apply

6.0 - 10.0 years

2 - 6 Lacs

Pune

Work from Office

Naukri logo

Req ID: 323909 We are currently seeking a Data Ingest Engineer to join our team in Pune, Mahrshtra (IN-MH), India (IN). Job DutiesThe Applications Development Technology Lead Analyst is a senior level position responsible for establishing and implementing new or revised application systems and programs in coordination with the Technology team. This is a position within the Ingestion team of the DRIFT data ecosystem. The focus is on ingesting data in a timely , complete, and comprehensive fashion while using the latest technology available to Citi. The ability to leverage new and creative methods for repeatable data ingestion from a variety of data sources while always questioning "is this the best way to solve this problem" and "am I providing the highest quality data to my downstream partners" are the questions we are trying to solve. Responsibilities: "¢ Partner with multiple management teams to ensure appropriate integration of functions to meet goals as well as identify and define necessary system enhancements to deploy new products and process improvements "¢ Resolve variety of high impact problems/projects through in-depth evaluation of complex business processes, system processes, and industry standards "¢ Provide expertise in area and advanced knowledge of applications programming and ensure application design adheres to the overall architecture blueprint "¢ Utilize advanced knowledge of system flow and develop standards for coding, testing, debugging, and implementation "¢ Develop comprehensive knowledge of how areas of business, such as architecture and infrastructure, integrate to accomplish business goals "¢ Provide in-depth analysis with interpretive thinking to define issues and develop innovative solutions "¢ Serve as advisor or coach to mid-level developers and analysts, allocating work as necessary "¢ Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency. Minimum Skills Required"¢ 6-10 years of relevant experience in Apps Development or systems analysis role "¢ Extensive experience system analysis and in programming of software applications "¢ Application Development using JAVA, Scala, Spark "¢ Familiarity with event driven applications and streaming data "¢ Experience with Confluent Kafka, HDFS, HIVE, structured and unstructured database systems (SQL and NoSQL) "¢ Experience with various schema and data types -> JSON, AVRO, Parquet, etc. "¢ Experience with various ELT methodologies and formats -> JDBC, ODBC, API, Web hook, SFTP, etc. "¢ Experience working within the Agile and version control tool sets (JIRA, Bitbucket, Git, etc.) "¢ Ability to adjust priorities quickly as circumstances dictate "¢ Demonstrated leadership and project management skills "¢ Consistently demonstrates clear and concise written and verbal communication

Posted 6 days ago

Apply

5.0 - 8.0 years

0 - 1 Lacs

Hyderabad

Hybrid

Naukri logo

Location: Hyderabad (Hybrid) Please share your resume with +91 9361912009 Roles and Responsibilities Deep understanding of Linux, networking and security fundamentals. Experience working with AWS cloud platform and infrastructure. Experience working with infrastructure as code with Terraform or Ansible tools. Experience managing large BigData clusters in production (at least one of -- Cloudera, Hortonworks, EMR). Excellent knowledge and solid work experience providing observability for BigData platforms using tools like Prometheus, InfluxDB, Dynatrace, Grafana, Splunk etc. Expert knowledge on Hadoop Distributed File System (HDFS) and Hadoop YARN. Decent knowledge of various Hadoop file formats like ORC, Parquet, Avro etc. Deep understanding of Hive (Tez), Hive LLAP, Presto and Spark compute engines. Ability to understand query plans and optimize performance for complex SQL queries on Hive and Spark. Experience supporting Spark with Python (PySpark) and R (SparklyR, SparkR) languages Solid professional coding experience with at least one scripting language - Shell, Python etc. Experience working with Data Analysts, Data Scientists and at least one of these related analytical applications like SAS, R-Studio, JupyterHub, H2O etc. Able to read and understand code (Java, Python, R, Scala), but expertise in at least one scripting languages like Python or Shell. Nice to have skills: Experience with workflow management tools like Airflow, Oozie etc. Knowledge in analytical libraries like Pandas, Numpy, Scipy, PyTorch etc. Implementation history of Packer, Chef, Jenkins or any other similar tooling. Prior working knowledge of Active Directory and Windows OS based VDI platforms like Citrix, AWS Workspaces etc.

Posted 1 week ago

Apply

3.0 - 8.0 years

4 - 8 Lacs

Chennai

Work from Office

Naukri logo

Your Profile As a senior software engineer with Capgemini, you will have 3 + years of experience in Scala with strong project track record Hands On experience in Scala/Spark developer Hands on SQL writing skills on RDBMS (DB2) databases Experience in working with different file formats like JSON, Parquet, AVRO, ORC and XML. Must have worked in a HDFS platform development project. Proficiency in data analysis, data profiling, and data lineage Strong oral and written communication skills Experience working in Agile projects. Your Role Work on Hadoop, Spark, Hive &SQL query Ability to perform code optimization for performance, Scalability and configurability Data application development at scale in the Hadoop ecosystem. What youll love about working here ChoosingCapgeminimeans having the opportunity to make a difference, whetherfor the worlds leading businesses or for society. It means getting the support youneed to shape your career in the way that works for you. It means when the futuredoesnt look as bright as youd like, youhave the opportunity tomake changetorewrite it. When you join Capgemini, you dont just start a new job. You become part of something bigger. A diverse collective of free-thinkers, entrepreneurs and experts, all working together to unleash human energy through technology, for an inclusive and sustainable future. At Capgemini, people are at the heart of everything we do! You can exponentially grow your career by being part of innovative projects and taking advantage of our extensiveLearning & Developmentprograms. With us, you will experience aninclusive, safe, healthy, andflexiblework environment to bring out the best in you! You also get a chance to make positive social change and build a better world by taking an active role in ourCorporate Social ResponsibilityandSustainabilityinitiatives. And whilst you make a difference, you will also have a lot offun. About Company

Posted 1 week ago

Apply

4.0 - 9.0 years

2 - 6 Lacs

Bengaluru

Work from Office

Naukri logo

Roles and Responsibilities: 4+ years of experience as a data developer using Python Knowledge in Spark, PySpark preferable but not mandatory Azure Cloud experience (preferred) Alternate Cloud experience is fine preferred experience in Azure platform including Azure data Lake, data Bricks, data Factory Working Knowledge on different file formats such as JSON, Parquet, CSV, etc. Familiarity with data encryption, data masking Database experience in SQL Server is preferable preferred experience in NoSQL databases like MongoDB Team player, reliable, self-motivated, and self-disciplined

Posted 1 week ago

Apply

8.0 - 10.0 years

10 - 12 Lacs

Hyderabad

Work from Office

Naukri logo

ABOUT THE ROLE Role Description: We are seeking a highly skilled and experienced hands-on Test Automation Engineering Manager with a deep e xpertise in Data Quality (DQ) , Data Integration (DIF) , and Data Governance . In this role, you will design and implement automated frameworks that ensure data accuracy, metadata consistency , and compliance throughout the data pipeline , leveraging technologies like Data bricks , AWS , and cloud-native tools . You will have a major focus on Data Cataloguing and Governance , ensuring that data assets are well-documented, auditable, and secure across the enterprise. In this role, you will be responsible for the end-to-end design and development of a test automation framework, working collaboratively with the team. As the delivery owner for test automation, your primary focus will be on building and automating comprehensive validation frameworks for data cataloging , data classification, and metadata tracking, while ensuring alignment with internal governance standards. will also work closely with data engineers, product teams, and data governance leads to enforce data quality and governance policies . Your efforts will play a key role in driving data integrity, consistency, and trust across the organization. The role is highly technical and hands-on , with a strong focus on automation, metadata validation , and ensuring data governance practices are seamlessly integrated into development pipelines. Roles & Responsibilities: Data Quality & Integration Frameworks Design and implement Data Quality (DQ) frameworks that validate schema compliance, transformations, completeness, null checks, duplicates, threshold rules, and referential integrity. Build Data Integration Frameworks (DIF) that validate end-to-end data pipelines across ingestion, processing, storage, and consumption layers. Automate data validations in Databricks/Spark pipelines, integrated with AWS services like S3, Glue, Athena, and Lake Formation. Develop modular, reusable validation components using PySpark, SQL, Python, and orchestration via CI/CD pipelines. Data Cataloging & Governance Integrate automated validations with AWS Glue Data Catalog to ensure metadata consistency, schema versioning, and lineage tracking. Implement checks to verify that data assets are properly cataloged, discoverable, and compliant with internal governance standards. Validate and enforce data classification, tagging, and access controls, ensuring alignment with data governance frameworks (e.g., PII/PHI tagging, role-based access policies). Collaborate with governance teams to automate policy enforcement and compliance checks for audit and regulatory needs. Visualization & UI Testing Automate validation of data visualizations in tools like Tableau, Power BI, Looker , or custom React dashboards. Ensure charts, KPIs, filters, and dynamic views correctly reflect backend data using UI automation (Selenium with Python) and backend validation logic. Conduct API testing (via Postman or Python test suites) to ensure accurate data delivery to visualization layers. Technical Skills and Tools Hands-on experience with data automation tools like Databricks and AWS is essential, as the manager will be instrumental in building and managing data pipelines. Leverage automated testing frameworks and containerization tools to streamline processes and improve efficiency. Experience in UI and API functional validation using tools such as Selenium with Python and Postman, ensuring comprehensive testing coverage. Technical Leadership, Strategy & Team Collaboration Define and drive the overall QA and testing strategy for UI and search-related components with a focus on scalability, reliability, and performance, while establishing alerting and reporting mechanisms for test failures, data anomalies, and governance violations. Contribute to system architecture and design discussions , bringing a strong quality and testability lens early into the development lifecycle. Lead test automation initiatives by implementing best practices and scalable frameworks, embedding test suites into CI/CD pipelines to enable automated, continuous validation of data workflows, catalog changes, and visualization updates Mentor and guide QA engineers , fostering a collaborative, growth-oriented culture focused on continuous learning and technical excellence. Collaborate cross-functionally with product managers, developers, and DevOps to align quality efforts with business goals and release timelines. Conduct code reviews, test plan reviews, and pair-testing sessions to ensure team-level consistency and high-quality standards. Good-to-Have Skills: Experience with data governance tools such as Apache Atlas , Collibra , or Alation Understanding of DataOps methodologies and practices Familiarity with monitoring/observability tools such as Datadog , Prometheus , or CloudWatch Experience building or maintaining test data generators Contributions to internal quality dashboards or data observability systems Awareness of metadata-driven testing approaches and lineage-based validations Experience working with agile Testing methodologies such as Scaled Agile. Familiarity with automated testing frameworks like Selenium, JUnit, TestNG, or PyTest. Must-Have Skills: Strong hands-on experience with Data Quality (DQ) framework design and automation Expertise in PySpark, Python, and SQL for data validations Solid understanding of ETL/ELT pipeline testing in Databricks or Apache Spark environments Experience validating structured and semi-structured data formats (e.g., Parquet, JSON, Avro) Deep familiarity with AWS data services: S3, Glue, Athena, Lake Formation, Data Catalog Integration of test automation with AWS Glue Data Catalog or similar catalog tools UI automation using Selenium with Python for dashboard and web interface validation API testing using Postman, Python, or custom API test scripts Hands-on testing of BI tools such as Tableau, Power BI, Looker, or custom visualization layers CI/CD test integration with tools like Jenkins, GitHub Actions, or GitLab CI Familiarity with containerized environments (e.g., Docker, AWS ECS/EKS) Knowledge of data classification, access control validation, and PII/PHI tagging Understanding of data governance standards (e.g., GDPR, HIPAA, CCPA) Understanding Data Structures: Knowledge of various data structures and their applications. Ability to analyze data and identify inconsistencies. Proven hands-on experience in test automation and data automation using Databricks and AWS. Strong knowledge of Data Integrity Frameworks (DIF) and Data Quality (DQ) principles. Familiarity with automated testing frameworks like Selenium, JUnit, TestNG, or PyTest. Strong understanding of data transformation techniques and logic. Education and Professional Certifications Bachelors degree in computer science and engineering preferred, other Engineering field is considered; Masters degree and 6+ years experience Or Bachelors degree and 8+ years Soft Skills: Excellent analytical and troubleshooting skills. Strong verbal and written communication skills Ability to work effectively with global, virtual teams High degree of initiative and self-motivation. Ability to manage multiple priorities successfully. Team-oriented, with a focus on achieving team goals Strong presentation and public speaking skills.

Posted 1 week ago

Apply

5.0 - 8.0 years

2 - 6 Lacs

Bengaluru

Work from Office

Naukri logo

Job Information Job Opening ID ZR_1628_JOB Date Opened 09/12/2022 Industry Technology Job Type Work Experience 5-8 years Job Title Data Engineer City Bangalore Province Karnataka Country India Postal Code 560001 Number of Positions 4 Roles and Responsibilities: 4+ years of experience as a data developer using Python Knowledge in Spark, PySpark preferable but not mandatory Azure Cloud experience (preferred) Alternate Cloud experience is fine preferred experience in Azure platform including Azure data Lake, data Bricks, data Factory Working Knowledge on different file formats such as JSON, Parquet, CSV, etc. Familiarity with data encryption, data masking Database experience in SQL Server is preferable preferred experience in NoSQL databases like MongoDB Team player, reliable, self-motivated, and self-disciplined check(event) ; career-website-detail-template-2 => apply(record.id,meta)" mousedown="lyte-button => check(event)" final-style="background-color:#2B39C2;border-color:#2B39C2;color:white;" final-class="lyte-button lyteBackgroundColorBtn lyteSuccess" lyte-rendered=""> I'm interested

Posted 1 week ago

Apply

8.0 - 13.0 years

30 - 35 Lacs

Bengaluru

Work from Office

Naukri logo

Data Engineer Location: Bangalore - Onsite Experience: 8 - 15 years Type: Full-time Role Overview We are seeking an experienced Data Engineer to build and maintain scalable, high-performance data pipelines and infrastructure for our next-generation data platform. The platform ingests and processes real-time and historical data from diverse industrial sources such as airport systems, sensors, cameras, and APIs. You will work closely with AI/ML engineers, data scientists, and DevOps to enable reliable analytics, forecasting, and anomaly detection use cases. Key Responsibilities Design and implement real-time (Kafka, Spark/Flink) and batch (Airflow, Spark) pipelines for high-throughput data ingestion, processing, and transformation. Develop data models and manage data lakes and warehouses (Delta Lake, Iceberg, etc) to support both analytical and ML workloads. Integrate data from diverse sources: IoT sensors, databases (SQL/NoSQL), REST APIs, and flat files. Ensure pipeline scalability, observability, and data quality through monitoring, alerting, validation, and lineage tracking. Collaborate with AI/ML teams to provision clean and ML-ready datasets for training and inference. Deploy, optimize, and manage pipelines and data infrastructure across on-premise and hybrid environments. Participate in architectural decisions to ensure resilient, cost-effective, and secure data flows. Contribute to infrastructure-as-code and automation for data deployment using Terraform, Ansible, or similar tools. Qualifications & Required Skills Bachelors or Master’s in Computer Science, Engineering, or related field. 6+ years in data engineering roles, with at least 2 years handling real-time or streaming pipelines. Strong programming skills in Python/Java and SQL. Experience with Apache Kafka, Apache Spark, or Apache Flink for real-time and batch processing. Hands-on with Airflow, dbt, or other orchestration tools. Familiarity with data modeling (OLAP/OLTP), schema evolution, and format handling (Parquet, Avro, ORC). Experience with hybrid/on-prem and cloud platforms (AWS/GCP/Azure) deployments. Proficient in working with data lakes/warehouses like Snowflake, BigQuery, Redshift, or Delta Lake. Knowledge of DevOps practices, Docker/Kubernetes, Terraform or Ansible. Exposure to data observability, data cataloging, and quality tools (e.g., Great Expectations, OpenMetadata). Good-to-Have Experience with time-series databases (e.g., InfluxDB, TimescaleDB) and sensor data. Prior experience in domains such as aviation, manufacturing, or logistics is a plus. Role & responsibilities Preferred candidate profile

Posted 1 week ago

Apply

4.0 - 8.0 years

4 - 8 Lacs

Gurugram

Work from Office

Naukri logo

Capgemini Invent Capgemini Invent is the digital innovation, consulting and transformation brand of the Capgemini Group, a global business line that combines market leading expertise in strategy, technology, data science and creative design, to help CxOs envision and build whats next for their businesses. Your Role Proficiency in MS Fabric,Azure Data Factory, Azure Synapse Analytics, Azure Databricks Extensive knowledge of MS Fabriccomponents Lakehouses, OneLake, Data Pipelines, Real-Time Analytics, Power BI Integration, Semantic Model. Integrate Fabric capabilities for seamless data flow, governance, and collaborationacross teams. Strong understanding of Delta Lake, Parquet, and distributed data systems. Strong programming skills in Python, PySpark,Scalaor SparkSQL/TSQLfor data transformations. Your Profile Strong experience in implementation and management of lake House using Databricks and Azure Tech stack (ADLS Gen2, ADF, Azure SQL) . Proficiencyin data integration techniques, ETL processes and data pipeline architectures. Understanding of Machine Learning Algorithms & AI/ML frameworks (i.e TensorFlow, PyTorch)and Power BIis an added advantage MS Fabric and PySpark is must. What you will love about working here We recognize the significance of flexible work arrangements to provide support. Be it remote work, or flexible work hours, you will get an environment to maintain healthy work life balance. At the heart of our mission is your career growth. Our array of career growth programs and diverse professions are crafted to support you in exploring a world of opportunities. Equip yourself with valuable certifications in the latest technologies such as Generative AI. About Capgemini Capgemini is a global business and technology transformation partner, helping organizations to accelerate their dual transition to a digital and sustainable world, while creating tangible impact for enterprises and society. It is a responsible and diverse group of 340,000 team members in more than 50 countries. With its strong over 55-year heritage, Capgemini is trusted by its clients to unlock the value of technology to address the entire breadth of their business needs. It delivers end-to-end services and solutions leveraging strengths from strategy and design to engineering, all fueled by its market leading capabilities in AI, cloud and data, combined with its deep industry expertise and partner ecosystem. The Group reported 2023 global revenues of 22.5 billion.

Posted 1 week ago

Apply

4.0 - 9.0 years

4 - 8 Lacs

Hyderabad

Work from Office

Naukri logo

Data Transformation: Utilize Data Build Tool (dbt) to transform raw data into curated data models according to business requirements. Implement data transformations and aggregations to support analytical and reporting needs. Orchestration and Automation: Design and implement automated workflows using Google Cloud Composer to orchestrate data pipelines and ensure timely data delivery. Monitor and troubleshoot data pipelines, identifying and resolving issues proactively. Develop and maintain documentation for data pipelines and workflows. GCP Expertise: Leverage GCP services, including BigQuery, Cloud Storage, and Pub/Sub, to build a robust and scalable data platform. Optimize BigQuery performance and cost through efficient query design and data partitioning. Implement data security and access controls in accordance with banking industry standards. Collaboration and Communication: Collaborate with Solution Architect and Data Modeler to understand data requirements and translate them into technical solutions. Communicate effectively with team members and stakeholders, providing regular updates on project progress. Participate in code reviews and contribute to the development of best practices. Data Pipeline Development: Design, develop, and maintain scalable and efficient data pipelines using Google Cloud Dataflow to ingest data from various sources, including relational databases (RDBMS), data streams, and files. Implement data quality checks and validation processes to ensure data accuracy and consistency. Optimize data pipelines for performance and cost-effectiveness. Banking Domain Knowledge (Preferred): Understanding of banking data domains, such as customer data, transactions, and financial products. Familiarity with regulatory requirements and data governance standards in the banking industry. Required Experience: Bachelor's degree in computer science, Engineering, or a related field. ETL Knowledge. 4-9 years of experience in data engineering, with a focus on building data pipelines and data transformations. Strong proficiency in SQL and experience working with relational databases. Hands-on experience with Google Cloud Platform (GCP) services, including Dataflow, BigQuery, Cloud Composer, and Cloud Storage. Experience with data transformation tools, preferably Data Build Tool (dbt). Proficiency in Python or other scripting languages is a plus. Experience with data orchestration and automation. Strong problem-solving and analytical skills. Excellent communication and collaboration skills. Experience with data streams like Pub/Sub or similar. Experience in working with files such as CSV, JSON and Parquet. Primary Skills: GCP, Dataflow, BigQuery, Cloud Composer, Cloud Storage, Data Pipeline, Composer, SQL, DBT, DWH Concepts. Secondary Skills: Python, Banking Domain knowledge, pub/sub, Cloud certifications (e.g. Data engineer), Git or any other version control system.

Posted 2 weeks ago

Apply

4.0 - 7.0 years

6 - 9 Lacs

Bengaluru

Work from Office

Naukri logo

What this job involves: JLL, an international real estate management company, is seeking an Data Engineer to join our JLL Technologies Team. We are seeking candidates that are self-starters to work in a diverse and fast-paced environment that can join our Enterprise Data team. We are looking for a candidate that is responsible for designing and developing of data solutions that are strategic for the business using the latest technologies Azure Databricks, Python, PySpark, SparkSQL, Azure functions, Delta Lake, Azure DevOps CI/CD. Responsibilities Design, Architect, and Develop solutions leveraging cloud big data technology to ingest, process and analyze large, disparate data sets to exceed business requirements. Design & develop data management and data persistence solutions for application use cases leveraging relational, non-relational databases and enhancing our data processing capabilities. Develop POCs to influence platform architects, product managers and software engineers to validate solution proposals and migrate. Develop data lake solution to store structured and unstructured data from internal and external sources and provide technical guidance to help migrate colleagues to modern technology platform. Contribute and adhere to CI/CD processes, development best practices and strengthen the discipline in Data Engineering Org. Develop systems that ingest, cleanse and normalize diverse datasets, develop data pipelines from various internal and external sources and build structure for previously unstructured data. Using PySpark and Spark SQL, extract, manipulate, and transform data from various sources, such as databases, data lakes, APIs, and files, to prepare it for analysis and modeling. Build and optimize ETL workflows using Azure Databricks and PySpark. This includes developing efficient data processing pipelines, data validation, error handling, and performance tuning. Perform the unit testing, system integration testing, regression testing and assist with user acceptance testing. Articulates business requirements in a technical solution that can be designed and engineered. Consults with the business to develop documentation and communication materials to ensure accurate usage and interpretation of JLL data. Implement data security best practices, including data encryption, access controls, and compliance with data protection regulations. Ensure data privacy, confidentiality, and integrity throughout the data engineering processes. Performs data analysis required to troubleshoot data related issues and assist in the resolution of data issues. Experience & Education Minimum of 4 years of experience as a data developer using Python, PySpark, Spark Sql, ETL knowledge, SQL Server, ETL Concepts. Bachelors degree in Information Science, Computer Science, Mathematics, Statistics or a quantitative discipline in science, business, or social science. Experience in Azure Cloud Platform, Databricks, Azure storage. Effective written and verbal communication skills, including technical writing. Excellent technical, analytical and organizational skills. Technical Skills & Competencies Experience handling un-structured, semi-structured data, working in a data lake environment, leveraging data streaming and developing data pipelines driven by events/queues Hands on Experience and knowledge on real time/near real time processing and ready to code Hands on Experience in PySpark, Databricks, and Spark Sql. Knowledge on json, Parquet and Other file format and work effectively with them No Sql Databases Knowledge like Hbase, Mongo, Cosmos etc. Preferred Cloud Experience on Azure or AWS Python-spark, Spark Streaming, Azure SQL Server, Cosmos DB/Mongo DB, Azure Event Hubs, Azure Data Lake Storage, Azure Search etc. Team player, Reliable, self-motivated, and self-disciplined individual capable of executing on multiple projects simultaneously within a fast-paced environment working with cross functional teams.

Posted 2 weeks ago

Apply

3.0 - 5.0 years

50 - 60 Lacs

Bengaluru

Work from Office

Naukri logo

Staff Data Engineer Experience: 3 - 5 Years Exp Salary : INR 50-60 Lacs per annum Preferred Notice Period : Within 30 Days Shift : 4:00PM to 1:00AM IST Opportunity Type: Remote Placement Type: Permanent (*Note: This is a requirement for one of Uplers' Clients) Must have skills required : ClickHouse, DuckDB, AWS, Python, SQL Good to have skills : DBT, Iceberg, Kestra, Parquet, SQLGlot Rill Data (One of Uplers' Clients) is Looking for: Staff Data Engineer who is passionate about their work, eager to learn and grow, and who is committed to delivering exceptional results. If you are a team player, with a positive attitude and a desire to make a difference, then we want to hear from you. Role Overview Description Rill is the worlds fastest BI tool, designed from the ground up for real-time databases like DuckDB and ClickHouse. Our platform combines last-mile ETL, an in-memory database, and interactive dashboards into a full-stack solution thats easy to deploy and manage. With a BI-as-code approach, Rill empowers developers to define and collaborate on metrics using SQL and YAML. Trusted by leading companies in e-commerce, digital marketing, and financial services, Rill provides the speed and scalability needed for operational analytics and partner-facing reporting. Job Summary Overview Rill is looking for a Staff Data Engineer to join our Field Engineering team. In this role, you will work closely with enterprise customers to design and optimize high-performance data pipelines powered by DuckDB and ClickHouse. You will also collaborate with our platform engineering team to evolve our incremental ingestion architectures and support proof-of-concept sales engagements. The ideal candidate has strong SQL fluency, experience with orchestration frameworks (e.g., Kestra, dbt, SQLGlot), familiarity with data lake table formats (e.g., Iceberg, Parquet), and an understanding of cloud databases (e.g., Snowflake, BigQuery). Most importantly, you should have a passion for solving real-world data engineering challenges at scale. Key Responsibilities Collaborate with enterprise customers to optimize data models for performance and cost efficiency. Work with the platform engineering team to enhance and refine our incremental ingestion architectures. Partner with account executives and solution architects to rapidly prototype solutions for proof-of-concept sales engagements. Qualifications (required) Fluency in SQL and competency in Python. Bachelors degree in a STEM discipline or equivalent industry experience. 3+ years of experience in a data engineering or related role. Familiarity with major cloud environments (AWS, Google Cloud, Azure) Benefits Competitive salary Health insurance Flexible vacation policy How to apply for this opportunity: Easy 3-Step Process: 1. Click On Apply! And Register or log in on our portal 2. Upload updated Resume & Complete the Screening Form 3. Increase your chances to get shortlisted & meet the client for the Interview! About Our Client: Rill is an operational BI tool that provides fast dashboards that your team will actually use. Data teams build fewer, more flexible dashboards for business users, while business users make faster decisions and perform root cause analysis, with fewer ad hoc requests. Rills unique architecture combines a last-mile ETL service, an in-memory database, and operational dashboards - all in a single solution. Our customers are leading media & advertising platforms, including Comcast's Freewheel, tvScientific, AT&T's DishTV, and more. About Uplers: Our goal is to make hiring and getting hired reliable, simple, and fast. Our role will be to help all our talents find and apply for relevant product and engineering job opportunities and progress in their career. (Note: There are many more opportunities apart from this on the portal.) So, if you are ready for a new challenge, a great work environment, and an opportunity to take your career to the next level, don't hesitate to apply today. We are waiting for you!

Posted 2 weeks ago

Apply

5.0 - 10.0 years

8 - 18 Lacs

Hyderabad

Work from Office

Naukri logo

Position Title: Senior Software Engineer Job Location: Hyderabad Education: CSE, ECE, IT, EEE Essential: Python, Vue JS, JavaScript, PostgreSQL Desired: Deployment, Gitlab, Linux Knowledge: AWS, Docker, ETL, Key cloak Experience: 4 to 8 Years: Experience in building and deploying web applications using the Python & Vue JS, React eco system. Experience in Deployment process of web servers (Django) & Vue JS, React JS using Nginx or Apache. Summary of work Environment and Work performed: Develop and maintain web-based applications (including Mobile Web) using mainly Python & JavaScript programming language and Django & Vue JS, React JS Frameworks. Specific Duties: Full stack developer who can understand and write Reusable, Testable and Efficient code using Python & JavaScript languages. Responsible for regular communication with others involved in the development process. Design and implementation of low-latency, high-availability, and performance applications. Desired Outcome: Ability to Design Performant Applications with Testable and Reusable code. Ability to Adapt different programming languages or framework based on requirement. Able to Participate in code reviews and design reviews. Skills and abilities: Python Django, Django Rest framework. JavaScript Vue JS, React JS Quasar Framework. Databases PostgreSQL, Redis, MSSQL, Influx, ETL. Deployment Nginx, Apache, Linux 5. Source Control Git, GitLab, CI/CD Best Regards, Kodandapani.D Executive-HR Dept:Human Resource 8341119158 Kodandapani.dabbugunta@tmeic.in Kodandapaani Yadav Dabbugunta | LinkedIn TMEIC Industrial Systems India Pvt. Ltd. Group company of TMEIC Corporation, Japan. Address: Unit No. 06-01, Level-6, Block-2, Cyber Pearl, HITEC City, Madhapur, Hyderabad 500081, Telangana. Website: www.tmeic.com

Posted 2 weeks ago

Apply

5.0 - 10.0 years

20 - 35 Lacs

Bengaluru

Work from Office

Naukri logo

Job Title: Senior Data Engineer ML & Azure Platform Location: Bangalore Experience: 5 - 10 years Joining Timeframe: Only candidates who can join within 1 month will be considered. Job Description: We are seeking a skilled Senior Data Engineer to work on end-to-end data engineering and data science use cases. The ideal candidate will have strong expertise in Python or Scala, Spark (Databricks), and SQL, and experience building scalable and efficient data pipelines on Azure. Primary Skills: Azure Data Platform Data Factory, Databricks Strong experience in SQL and Python or Scala Experience with ETL/ELT pipelines and transformations Knowledge of Spark , Delta Lake , Parquet , and Big Data technologies Familiarity with MLOps , CI/CD pipelines, model monitoring, versioning Performance tuning and pipeline optimization Data quality checks and feature engineering Nice-to-Have Skills: Exposure to NLP , time-series forecasting , anomaly detection Knowledge of data governance frameworks Understanding of retail or workforce analytics domains Note: Please apply only if you're available to join within 1 month. To Apply: Kindly share your updated resume , current CTC , expected CTC and notice period to vijay.s@xebia.com.

Posted 2 weeks ago

Apply

5.0 - 10.0 years

1 - 5 Lacs

Bengaluru

Work from Office

Naukri logo

Job Title:AWS Data Engineer Experience5-10 Years Location:Bangalore : Technical Skills: 5 + Years of experience as AWS Data Engineer, AWS S3, Glue Catalog, Glue Crawler, Glue ETL, Athena write Glue ETLs to convert data in AWS RDS for SQL Server and Oracle DB to Parquet format in S3 Execute Glue crawlers to catalog S3 files. Create catalog of S3 files for easier querying Create SQL queries in Athena Define data lifecycle management for S3 files Strong experience in developing, debugging, and optimizing Glue ETL jobs using PySpark or Glue Studio. Ability to connect Glue ETLs with AWS RDS (SQL Server and Oracle) for data extraction and write transformed data into Parquet format in S3. Proficiency in setting up and managing Glue Crawlers to catalog data in S3. Deep understanding of S3 architecture and best practices for storing large datasets. Experience in partitioning and organizing data for efficient querying in S3. Knowledge of Parquet file format advantages for optimized storage and querying. Expertise in creating and managing the AWS Glue Data Catalog to enable structured and schema-aware querying of data in S3. Experience with Amazon Athena for writing complex SQL queries and optimizing query performance. Familiarity with creating views or transformations in Athena for business use cases. Knowledge of securing data in S3 using IAM policies, S3 bucket policies, and KMS encryption. Understanding of regulatory requirements (e.g., GDPR) and implementing secure data handling practices. Non-Technical Skills: Candidate needs to be Good Team Player Effective interpersonal, team building and communication skills. Ability to communicate complex technology to no tech audience in simple and precise manner.

Posted 3 weeks ago

Apply

3.0 - 7.0 years

6 - 10 Lacs

Mumbai

Work from Office

Naukri logo

Role Overview : Looking for a Kafka SME to design and support real-time data ingestion pipelines using Kafka within a Cloudera-based Lakehouse architecture. Key Responsibilities : Design Kafka topics, partitions, schema registry Implement producer-consumer apps using Spark Structured Streaming Set up Kafka Connect, monitoring, and alerts Ensure secure, scalable message delivery Required education Bachelor's Degree Preferred education Master's Degree Required technical and professional expertise Skills Required : Deep understanding of Kafka internals and ecosystem Integration with Cloudera and NiFi Schema evolution and serialization (Avro, Parquet) Performance tuning and fault-tolerance Preferred technical and professional experience Good communication skill. India market experience is preferred.

Posted 3 weeks ago

Apply

5.0 - 8.0 years

7 - 10 Lacs

Mumbai

Work from Office

Naukri logo

So, whats the job? You'll lead the design, development, and optimization of scalable, maintainable, and high-performance ETL/ELT pipelines using Informatica IDMC CDI. You'll manage and optimize cloud-based storage environments, including AWS S3 buckets. You'll implement robust data integration solutions that ingest, cleanse, transform, and deliver structured and semi-structured data from diverse sources to downstream systems and data warehouses. You'll support data integration from source systems, ensuring data quality and completeness. You'll automate data loading and transformation processes using tools such as Python, SQL, and orchestration frameworks. You'll contribute to the strategic transition toward cloud-native data platforms (e.g., AWS S3, Snowflake) by designing hybrid or fully cloud-based data solutions. You'll collaborate with Data Architects to align data models and structures with enterprise standards. You'll maintain clear documentation of data pipelines, processes, and technical standards, and mentor team members in best practices and tool usage. You'll implement and enforce data security, access controls, and compliance measures in line with organizational policies. And what are we looking for? You'll have a Bachelors degree in Computer Science, Engineering, or a related field with a minimum of 5 years of industry experience. You'll be an expert in designing, developing, and optimizing ETL/ELT pipelines using Informatica IDMC Cloud Data Integration (CDI). You'll bring strong experience with data ingestion, transformation, and delivery across diverse data sources and targets. You'll have a deep understanding of data integration patterns, orchestration strategies, and data pipeline lifecycle management. You'll be proficient in implementing incremental loads, CDC (Change Data Capture), and data synchronization. You'll bring strong experience with SQL Server, including performance tuning, stored procedures, and indexing strategies. You'll possess a solid understanding of data modeling, data warehousing concepts (star/snowflake schema), and dimensional modeling. You'll have experience integrating with cloud data warehouses such as Snowflake. You'll be familiar with cloud storage and compute platforms such as AWS S3, EC2, Lambda, Glue, and RDS. You'll design and implement cloud-native data architectures using modern tools and best practices. You'll have exposure to data migration and hybrid architecture design (on-prem to cloud). You'll be experienced with Informatica Intelligent Cloud Services (IICS), especially IDMC CDI. You'll have strong proficiency in SQL, T-SQL, and scripting languages like Python or Shell. You'll have experience with workflow orchestration tools like Apache Airflow, Informatica task flows, or Control-M. You'll be knowledgeable in API integration, REST/SOAP, and file-based data exchange (e.g., SFTP, CSV, Parquet). You'll implement data validation, error handling, and data quality frameworks. You'll have an understanding of data lineage, metadata management, and governance best practices. You'll set up monitoring, logging, and alerting for ETL processes.

Posted 3 weeks ago

Apply

10.0 - 15.0 years

25 - 35 Lacs

Pune

Work from Office

Naukri logo

Education and Qualifications • Bachelors degree in IT, Computer Science, Software Engineering, Business Analytics or equivalent. Work Experience • Minimum 10 years of experience in data analytics field Minimum 6 years of experience in running operation and support in Cloud Data Lakehouse environment Experience with Azure Databricks Experience in building and optimizing data pipelines, architectures and data sets Excellent experience in Scala or Python Ability to troubleshoot and optimize complex queries on the Spark platform Knowledgeable on structured and unstructured data design / modeling, data access and data storage techniques Experience with DevOps tools and environment Technical / Professional Skills Please provide at least 3 • Azure Databricks Python / Scala / Java HIVE / HBase / Impala / Parquet Sqoop, Kafka, Flume SQL and RDBMS Airflow Jenkins / Bamboo Github / Bitbucket Nexus Have you worked in sizing clusters for Databricks in Azure cloud environment? Have you done hand-on configuration and administration of Databricks platform on Azure Cloud? Have you experience in cluster management, storage management, workspace management, key management etc? Have you done cost optimization exercises to reduce the consumption cost of Databricks clusters? Have you done cost forecasting of Databricks platform on Azure Cloud? How you do monitor cost anomaly, identify cost driver and come up with recommendation? Have you done any RBAC configuration in Databricks platform on Azure Cloud? Have you configured connectivity from Databricks to internal/external sources/applications such as Power BI, Google Analytics, SharePoint etc What have you implemented/how do you monitor the health of Databricks Platform, its services, the health of ETL pipeline and the end-points What kind of proactive or self-healing process are put in place to ensure service availability?

Posted 3 weeks ago

Apply

9 - 11 years

37 - 40 Lacs

Ahmedabad, Bengaluru, Mumbai (All Areas)

Work from Office

Naukri logo

Dear Candidate, We are hiring a Scala Developer to work on high-performance distributed systems, leveraging the power of functional and object-oriented paradigms. This role is perfect for engineers passionate about clean code, concurrency, and big data pipelines. Key Responsibilities: Build scalable backend services using Scala and the Play or Akka frameworks . Write concurrent and reactive code for high-throughput applications . Integrate with Kafka, Spark, or Hadoop for data processing. Ensure code quality through unit tests and property-based testing . Work with microservices, APIs, and cloud-native deployments. Required Skills & Qualifications: Proficient in Scala , with a strong grasp of functional programming Experience with Akka, Play, or Cats Familiarity with Big Data tools and RESTful API development Bonus: Experience with ZIO, Monix, or Slick Soft Skills: Strong troubleshooting and problem-solving skills. Ability to work independently and in a team. Excellent communication and documentation skills. Note: If interested, please share your updated resume and preferred time for a discussion. If shortlisted, our HR team will contact you. Kandi Srinivasa Reddy Delivery Manager Integra Technologies

Posted 1 month ago

Apply

8 - 11 years

45 - 50 Lacs

Chennai, Noida, Kolkata

Work from Office

Naukri logo

Dear Candidate, We are hiring a Scala Developer to work on scalable data pipelines, distributed systems, and backend services. This role is perfect for candidates passionate about functional programming and big data. Key Responsibilities: Develop data-intensive applications using Scala . Work with frameworks like Akka, Play, or Spark . Design and maintain scalable microservices and ETL jobs. Collaborate with data engineers and platform teams. Write clean, testable, and well-documented code. Required Skills & Qualifications: Strong in Scala, Functional Programming, and JVM internals Experience with Apache Spark, Kafka, or Cassandra Familiar with SBT, Cats, or Scalaz Knowledge of CI/CD, Docker, and cloud deployment tools Soft Skills: Strong troubleshooting and problem-solving skills. Ability to work independently and in a team. Excellent communication and documentation skills. Note: If interested, please share your updated resume and preferred time for a discussion. If shortlisted, our HR team will contact you. Kandi Srinivasa Delivery Manager Integra Technologies

Posted 2 months ago

Apply

4 - 7 years

6 - 10 Lacs

Gurgaon

Work from Office

Naukri logo

SkExperienced in MS*Server, SSIS, and Talend, Knowledge on Databricks, Snowflake, Teradata, Oracle, Google Big Query, Hadoop, and related stacks, diverse data sources and data formats (xml, json, yaml, parquet, avro, delta) Required Candidate profile Notice Period: Not Available Education: BE BTech, ME MTech, BCA, MCA Notice Period: immediate-30 days

Posted 2 months ago

Apply

5 - 10 years

9 - 19 Lacs

Hyderabad

Work from Office

Naukri logo

Roles & Responsibilities: We are seeking a skilled Senior Backend Java Engineer with experience in Databricks and working with big data file formats (e.g., Parquet, Delta, Avro, ORC) to join our team. This role involves creating and maintaining microservices for our application. The ideal candidate should have a strong background in backend development and a keen understanding of modern data processing frameworks. Key Responsibilities: Design, develop, and maintain high-quality microservices for our application. Work with Databricks for data processing and transformation. Handle various big data file formats, like Parquet, Delta, Avro, and ORC, for efficient data storage and retrieval. Collaborate with cross-functional teams to ensure scalable and maintainable system architecture. Optimize applications for performance and scalability in cloud environments. Utilize AWS services to build and deploy cloud-native solutions (preferred). Required Skills and Qualifications: Strong expertise in Java and backend development. Hands-on experience with Databricks and working with big data file formats such as Parquet, Delta, Avro, or ORC. Experience with microservices architecture and related frameworks. Familiarity with AWS services or similar cloud solutions is a plus. Excellent problem-solving skills and ability to work in a collaborative environment

Posted 2 months ago

Apply

5 - 8 years

15 - 27 Lacs

Hyderabad, Gurgaon, Noida

Work from Office

Naukri logo

We're Nagarro. We are a Digital Product Engineering company that is scaling in a big way! We build products, services, and experiences that inspire, excite, and delight. We work at a scale across all devices and digital mediums, and our people exist everywhere in the world (18000+ experts across 36 countries, to be exact). Our work culture is dynamic and non-hierarchical. We are looking for great new colleagues. That is where you come in! REQUIREMENTS: Expert knowledge in databases like PostgreSQL (preferably cloud-hosted in AWS, Azure, GCP), and Snowflake Data Warehouse with strong programming experience in SQL. Competence in data preparation and/or ETL tools to build and maintain data pipelines and flows. Expertise in Python and experience working on ML models. Deep knowledge of databases, stored procedures, and optimization of large data sets. In-depth knowledge of ingestion techniques, data cleaning, de-duplication, and partitioning. Understanding of index design and performance-tuning techniques. Familiarity with SQL security techniques such as data encryption at the column level, Transparent Data Encryption (TDE), signed stored procedures, and assignment of user permissions. Experience in understanding source data from various platforms and mapping them into Entity Relationship Models (ER) for data integration and reporting. Exposure to source control tools like GIT, Azure DevOps. Understanding of Agile methodologies (Scrum, Kanban). Experience with automated testing and coverage tools. Experience with CI/CD automation tools (desirable). Programming language experience in Golang (desirable). RESPONSIBILITIES: Design and implement Snowflake-based data warehouse solutions. Develop and optimize complex SQL queries, stored procedures, and views in Snowflake. Build ETL/ELT data pipelines for efficient data processing. Work with structured and semi-structured data (JSON, Parquet, Avro) for data ingestion and processing. Implement data partitioning, clustering, and performance tuning strategies. Manage role-based access control (RBAC), security, and data governance in Snowflake. Integrate Snowflake with BI tools (Power BI, Tableau, Looker) for reporting and analytics. Create and maintain optimal data pipeline architecture. Assemble large, complex data sets that meet functional/non-functional business requirements. Build pipelines for optimal extraction, transformation, and loading of data from various sources using SQL and cloud database technologies. Prepare ML models for data analysis and prediction. Work with stakeholders including Executive, Product, Data, and Design teams to assist with data-related technical issues and support their data infrastructure needs. Ensure data separation and security across national boundaries through multiple data centers and regions. Collaborate with data and analytics experts to enhance functionality in our data systems. Manage exploratory data analysis to support database and dashboard development.

Posted 2 months ago

Apply

6 - 11 years

8 - 14 Lacs

Chennai

Hybrid

Naukri logo

We are seeking an Azure Data Tester to help create test automation by following an existing data automation framework. The role involves validating business rules for audience creation across multiple social media channels. Required Candidate profile Validate data flow from disparate sources into various data stores,including Event Hub/Data Lakes Post ingestion,data will be transformed using Azure Data Factory workflows/stored in Azure Databricks

Posted 2 months ago

Apply

5 - 10 years

10 - 15 Lacs

Pune

Work from Office

Naukri logo

Meeting with managers to determine the company’s Big Data needs Developing big data solutions on AWS, using Apache Spark, Databricks, Delta Tables, EMR, Athena, Glue, Hadoop, Familiarity with Data warehousing will be a plus NoSQL and RDBMS databases Required Candidate profile Loading disparate data sets and conducting pre-processing services using Athena, Glue, Spark, etc. Building cloud platforms for the development of company applications. Maintaining production systems.

Posted 2 months ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies