Jobs
Interviews

74 Parquet Jobs - Page 3

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

7.0 - 9.0 years

0 Lacs

Pune, Maharashtra, India

On-site

Vodafone Idea Limited is an Aditya Birla Group and Vodafone Group partnership. It is India's leading telecom service provider. The Company provides pan India Voice and Data services across 2G, 3G and 4G platform. With the large spectrum portfolio to support the growing demand for data and voice, the company is committed to deliver delightful customer experiences and contribute towards creating a truly Digital India by enabling millions of citizens to connect and build a better tomorrow. The Company is developing infrastructure to introduce newer and smarter technologies, making both retail and enterprise customers future ready with innovative offerings, conveniently accessible through an ecosystem of digital channels as well as extensive on-ground presence. The Company is listed on National Stock Exchange (NSE) and Bombay Stock Exchange (BSE) in India. We're proud to be an equal opportunity employer. At VIL, we know that diversity makes us stronger. We are committed to a collaborative, inclusive environment that encourages authenticity and fosters a sense of belonging. We strive for everyone to feel valued, connected and empowered to reach their potential and contribute their best. VIL's goal is to build and maintain a workforce that is diverse in experience and background but uniform in reflecting our Values of Passion, Boldness, Trust, Speed and Digital. Consequently, our recruiting efforts are directed towards attracting and retaining best and brightest talents. Our endeavour is to be First Choice for prospective employees. VIL ensures equal employment opportunity without discrimination or harassment based on race, colour, religion, creed, age, sex, sex stereotype, gender, gender identity or expression, sexual orientation, national origin, citizenship, disability, marital and civil partnership/union status, pregnancy, veteran or military service status, genetic information, or any other characteristic protected by law. VIL is an equal opportunity employer committed to diversifying its workforce. Role Legal & IPDR Hadoop Ops Job Level/ Designation AGM (Band M2) Function / Department IT-Application Operations Location Pune Job Purpose Manage central operations of Legal on CDP platform and IPDR Hadoop Platform services ensuring critical deliveries to Nodal and law enforcement agencies. Accountable to ensure availability of critical Voice, SMS, GPRS CDRs and IPDRs timely and accurately for consumption by nodal and Law enforcement teams. Oversee Incidents, Problems, Releases, changes & Solution Reviews, ensuring uninterrupted service support as well as seamless transition from Build to Operate. Spearhead operational excellence initiatives and enforce compliance with IT KPIs to maintain high service availability. Drive operational excellence initiatives to ensure compliance to DOT service SLA commitments. Demonstrate strong service-oriented approach, taking full ownership of service management in multi partner setup. Key Result Areas/Accountabilities Ensure operational excellence and manage a legal system on Cloudera data platform and IPDR platform on Cloudera data platform. Ensure high performance and service excellence for Nodal and Legal teams. Ensure adherence to 7 critical IT SLAs, while processing 8+ trillion CDRs daily leading to 50+ Nodal & Legal KPIs, and ensuring adherence to high service performance guarantees through a multi partner engagement comprising of Accolite, Kyndryl, Cloudera and IBM. Manage direct / Indirect team of approx 30+ resources. Drive multi partner governance to drive service improvements, ensure compliance with service guarantees and communicate service performance to managers. Drive excellence initiatives through automations to bring in operational efficiency and leverage AI, ML, RPA, etc technologies to streamline operations, reduce MTTR, process improvements and foster cost efficiency and business benefits. Good knowledge of service management, close coordination with delivery and transformation team to ensure smooth and seamless deployment of new IT applications or updates to existing ones Core Competencies, Knowledge, Experience A technologist with 7+ years of experience in managing Hadoop Big Data technologies viz. Hortonworks Data Platform and Cloudera Data Platform. Good knowledge of Cloudera manager and CDP Cluster tuning. Must be proficient with latest technologies viz. MapReduce, Kafka, Kerberos, Impala, HDFS, HBase, Hive, Kudu, Spark, Ice Berg, Yarn and Ozone. Should have good knowledge of Parquet, ORC and AVRO file formats Hadoop. Experience in managing production environment, including high availability, SLAs, KPI monitoring, performance optimization, real-time ingestions, and data model management. Expertise in driving automation initiatives for incident reduction, process optimization, high availability through cutting-edge technologies, fostering an environment of continuous improvement. Proficient in analytical, independent thinking with a strong technological background. Skilled in managing application operations, incident & problem management, good knowledge of capacity planning and performance management Proficient in service delivery and streamlining processes and technologies, eliminating redundancies, and optimizing solutions. Proficient in fine tuning of Big data cluster technologies to improve performance. Exposure to Agile, ITIL, and DevOps methodologies Good communication and stakeholder management skills to manage partners and business stakeholders alike. Must have technical / professional qualifications BE, MCA/MBA or Higher Cloudera Certification in Cloudera Data Platform Hadoop Certification in Hortonworks Data Platform Years of Experience 7+ years Industries to look from Telecom Big Data Professionals from BFSI, FMCG, Network Equipment manufacturers, IT Service providers Ideal Organizations to look from TSP IT service providers for TSP (viz. White Clay, IBM, Infosys, Koenig Sol, Aveva, Equinix, PoleStar Sol, Kyndryl, Ericson, Cisco) Direct Reports NA Vodafone Idea Limited (formerly Idea Cellular Limited) An Aditya Birla Group & Vodafone partnership

Posted 1 month ago

Apply

2.0 - 4.0 years

7 - 9 Lacs

Hyderabad, Chennai, Bengaluru

Hybrid

POSITION Senior Data Engineer / Data Engineer LOCATION Bangalore/Mumbai/Kolkata/Gurugram/Hyd/Pune/Chennai EXPERIENCE 2+ Years JOB TITLE: Senior Data Engineer / Data Engineer OVERVIEW OF THE ROLE: As a Data Engineer or Senior Data Engineer, you will be hands-on in architecting, building, and optimizing robust, efficient, and secure data pipelines and platforms that power business-critical analytics and applications. You will play a central role in the implementation and automation of scalable batch and streaming data workflows using modern big data and cloud technologies. Working within cross-functional teams, you will deliver well-engineered, high-quality code and data models, and drive best practices for data reliability, lineage, quality, and security. HASHEDIN BY DELOITTE 2025 Mandatory Skills: Hands-on software coding or scripting for minimum 3 years Experience in product management for at-least 2 years Stakeholder management experience for at-least 3 years Experience in one amongst GCP, AWS or Azure cloud platform Key Responsibilities: Design, build, and optimize scalable data pipelines and ETL/ELT workflows using Spark (Scala/Python), SQL, and orchestration tools (e.g., Apache Airflow, Prefect, Luigi). Implement efficient solutions for high-volume, batch, real-time streaming, and event-driven data processing, leveraging best-in-class patterns and frameworks. Build and maintain data warehouse and lakehouse architectures (e.g., Snowflake, Databricks, Delta Lake, BigQuery, Redshift) to support analytics, data science, and BI workloads. Develop, automate, and monitor Airflow DAGs/jobs on cloud or Kubernetes, following robust deployment and operational practices (CI/CD, containerization, infra-as-code). Write performant, production-grade SQL for complex data aggregation, transformation, and analytics tasks. Ensure data quality, consistency, and governance across the stack, implementing processes for validation, cleansing, anomaly detection, and reconciliation. Collaborate with Data Scientists, Analysts, and DevOps engineers to ingest, structure, and expose structured, semi-structured, and unstructured data for diverse use-cases. Contribute to data modeling, schema design, data partitioning strategies, and ensure adherence to best practices for performance and cost optimization. Implement, document, and extend data lineage, cataloging, and observability through tools such as AWS Glue, Azure Purview, Amundsen, or open-source technologies. Apply and enforce data security, privacy, and compliance requirements (e.g., access control, data masking, retention policies, GDPR/CCPA). Take ownership of end-to-end data pipeline lifecycle: design, development, code reviews, testing, deployment, operational monitoring, and maintenance/troubleshooting. Contribute to frameworks, reusable modules, and automation to improve development efficiency and maintainability of the codebase. Stay abreast of industry trends and emerging technologies, participating in code reviews, technical discussions, and peer mentoring as needed. Skills & Experience: Proficiency with Spark (Python or Scala), SQL, and data pipeline orchestration (Airflow, Prefect, Luigi, or similar). Experience with cloud data ecosystems (AWS, GCP, Azure) and cloud-native services for data processing (Glue, Dataflow, Dataproc, EMR, HDInsight, Synapse, etc.). © HASHEDIN BY DELOITTE 2025 Hands-on development skills in at least one programming language (Python, Scala, or Java preferred); solid knowledge of software engineering best practices (version control, testing, modularity). Deep understanding of batch and streaming architectures (Kafka, Kinesis, Pub/Sub, Flink, Structured Streaming, Spark Streaming). Expertise in data warehouse/lakehouse solutions (Snowflake, Databricks, Delta Lake, BigQuery, Redshift, Synapse) and storage formats (Parquet, ORC, Delta, Iceberg, Avro). Strong SQL development skills for ETL, analytics, and performance optimization. Familiarity with Kubernetes (K8s), containerization (Docker), and deploying data pipelines in distributed/cloud-native environments. Experience with data quality frameworks (Great Expectations, Deequ, or custom validation), monitoring/observability tools, and automated testing. Working knowledge of data modeling (star/snowflake, normalized, denormalized) and metadata/catalog management. Understanding of data security, privacy, and regulatory compliance (access management, PII masking, auditing, GDPR/CCPA/HIPAA). Familiarity with BI or visualization tools (PowerBI, Tableau, Looker, etc.) is an advantage but not core. Previous experience with data migrations, modernization, or refactoring legacy ETL processes to modern cloud architectures is a strong plus. Bonus: Exposure to open-source data tools (dbt, Delta Lake, Apache Iceberg, Amundsen, Great Expectations, etc.) and knowledge of DevOps/MLOps processes. Professional Attributes: Strong analytical and problem-solving skills; attention to detail and commitment to code quality and documentation. Ability to communicate technical designs and issues effectively with team members and stakeholders. Proven self-starter, fast learner, and collaborative team player who thrives in dynamic, fast-paced environments. Passion for mentoring, sharing knowledge, and raising the technical bar for data engineering practices. Desirable Experience: Contributions to open source data engineering/tools communities. Implementing data cataloging, stewardship, and data democratization initiatives. Hands-on work with DataOps/DevOps pipelines for code and data. Knowledge of ML pipeline integration (feature stores, model serving, lineage/monitoring integration) is beneficial. © HASHEDIN BY DELOITTE 2025 EDUCATIONAL QUALIFICATIONS: Bachelor’s or Master’s degree in Computer Science, Data Engineering, Information Systems, or related field (or equivalent experience). Certifications in cloud platforms (AWS, GCP, Azure) and/or data engineering (AWS Data Analytics, GCP Data Engineer, Databricks). Experience working in an Agile environment with exposure to CI/CD, Git, Jira, Confluence, and code review processes. Prior work in highly regulated or large-scale enterprise data environments (finance, healthcare, or similar) is a plus.

Posted 1 month ago

Apply

10.0 - 15.0 years

10 - 15 Lacs

Bengaluru / Bangalore, Karnataka, India

On-site

Roles : Deep database expertise. Expert in architecting data lake solutions Deep knowledge in designing and architecting Database systems Hands on with Apache services like Arrow, Parquet, Spark, etc Hands on experience with databases (postgres, redis ), Apache services ( Parquet, Arrow etc ) Deep Data Lake expertise Long term maintenance and support of this DB system

Posted 1 month ago

Apply

10.0 - 17.0 years

10 - 17 Lacs

Bengaluru / Bangalore, Karnataka, India

On-site

Database architect India: Deep database expertise. Expert in architecting data lake solutions Deep knowledge in designing and architecting Database systems Hands on with Apache services like Arrow, Parquet, Spark, etc Hands on experience with databases (postgres, redis ), Apache services ( Parquet, Arrow etc ) Deep Data Lake expertise Long term maintenance and support of this DB system

Posted 1 month ago

Apply

15.0 - 20.0 years

10 - 14 Lacs

Bengaluru

Work from Office

Project Role : Application Lead Project Role Description : Lead the effort to design, build and configure applications, acting as the primary point of contact. Must have skills : Palantir Foundry Good to have skills : NAMinimum 5 year(s) of experience is required Educational Qualification : 15 years full time education Project Role :Lead Data Engineer Project Role Description :Design, build and enhance applications to meet business process and requirements in Palantir foundry.Work experience :Minimum 6 years Must have Skills :Palantir Foundry , PySpark, TypeScript (for customizing Workshop Forms & UI)Good to Have Skills :Experience in Pyspark, python and SQLKnowledge on Big Data tools & TechnologiesOrganizational and project management experience.Job & Key Responsibilities :Responsible for designing , developing, testing, and supporting data pipelines and applications on Palantir foundry.Configure and customize Workshop to design and implement workflows and ontologies.Configure and customize Workshop applications, including designing Forms, Workflows, and Ontology-based interactions.Write TypeScript to create dynamic and interactive Forms in Workshop for user-driven data entry and validation.Collaborate with data engineers and stakeholders to ensure successful deployment and operation of Palantir foundry applications.Work with stakeholders including the product owner, data, and design teams to assist with data-related technical issues and understand the requirements and design the data pipeline.Work independently, troubleshoot issues and optimize performance.Communicate design processes, ideas, and solutions clearly and effectively to team and client. Assist junior team members in improving efficiency and productivity. Technical Experience :Proficiency in PySpark, Python and Sql with demonstrable ability to write & optimize SQL and spark jobs.Hands on experience on Palantir foundry related services like Data Connection, Code repository, Contour , Data lineage & Health checks.Good to have working experience with workshop , ontology , slate.Hands-on experience in data engineering and building data pipelines (Code/No Code) for ELT/ETL data migration, data refinement and data quality checks on Palantir Foundry.Experience in TypeScript to create and customize Forms in Workshop, including form validation, user interactions, and data binding with Ontology.Experience in ingesting data from different external source systems using data connections and sync.Good Knowledge on Spark Architecture and hands on experience on performance tuning & code optimization.Proficient in managing both structured and unstructured data, with expertise in handling various file formats such as CSV, JSON, Parquet, and ORC.Experience in developing and managing scalable architecture & managing large data sets.Good understanding of data loading mechanism and adeptly implement strategies for capturing CDC.Nice to have test driven development and CI/CD workflows.Experience in version control software such as Git and working with major hosting services (e. g. Azure DevOps, GitHub, Bitbucket, Gitlab).Implementing code best practices involves adhering to guidelines that enhance code readability, maintainability, and overall quality. Educational Qualification:15 years of full-term education Qualification 15 years full time education

Posted 1 month ago

Apply

7.0 - 12.0 years

15 - 27 Lacs

Bengaluru

Hybrid

Labcorp is hiring a Senior Data engineer. This person will be an integrated member of Labcorp Data and Analytics team and work within the IT team. Play a crucial role in designing, developing and maintaining data solutions using Databricks, Fabric, Spark, PySpark and Python. Responsible to review business requests and translate them into technical solution and technical specification. In addition, work with team members to mentor fellow developers to grow their knowledge and expertise. Work in a fast paced and high-volume processing environment, where quality and attention to detail are vital. RESPONSIBILITIES: Design and implement end-to-end data engineering solutions by leveraging the full suite of Databricks, Fabric tools, including data ingestion, transformation, and modeling. Design, develop and maintain end-to-end data pipelines by using spark, ensuring scalability, reliability, and cost optimized solutions. Conduct performance tuning and troubleshooting to identify and resolve any issues. Implement data governance and security best practices, including role-based access control, encryption, and auditing. Work in fast-paced environment and perform effectively in an agile development environment. REQUIREMENTS: 8+ years of experience in designing and implementing data solutions with at least 4+ years of experience in data engineering. Extensive experience with Databricks, Fabric, including a deep understanding of its architecture, data modeling, and real-time analytics. Minimum 6+ years of experience in Spark, PySpark and Python. Must have strong experience in SQL, Spark SQL, data modeling & RDBMS concepts. Strong knowledge of Data Fabric services, particularly Data engineering, Data warehouse, Data factory, and Real- time intelligence. Strong problem-solving skills, with ability to perform multi-tasking. Familiarity with security best practices in cloud environments, Active Directory, encryption, and data privacy compliance. Communicate effectively in both oral and written. Experience in AGILE development, SCRUM and Application Lifecycle Management (ALM). Preference given to current or former Labcorp employees. EDUCATION: Bachelors in engineering, MCA.

Posted 1 month ago

Apply

6.0 - 10.0 years

2 - 6 Lacs

Pune

Work from Office

Req ID: 323909 We are currently seeking a Data Ingest Engineer to join our team in Pune, Mahrshtra (IN-MH), India (IN). Job DutiesThe Applications Development Technology Lead Analyst is a senior level position responsible for establishing and implementing new or revised application systems and programs in coordination with the Technology team. This is a position within the Ingestion team of the DRIFT data ecosystem. The focus is on ingesting data in a timely , complete, and comprehensive fashion while using the latest technology available to Citi. The ability to leverage new and creative methods for repeatable data ingestion from a variety of data sources while always questioning "is this the best way to solve this problem" and "am I providing the highest quality data to my downstream partners" are the questions we are trying to solve. Responsibilities: "¢ Partner with multiple management teams to ensure appropriate integration of functions to meet goals as well as identify and define necessary system enhancements to deploy new products and process improvements "¢ Resolve variety of high impact problems/projects through in-depth evaluation of complex business processes, system processes, and industry standards "¢ Provide expertise in area and advanced knowledge of applications programming and ensure application design adheres to the overall architecture blueprint "¢ Utilize advanced knowledge of system flow and develop standards for coding, testing, debugging, and implementation "¢ Develop comprehensive knowledge of how areas of business, such as architecture and infrastructure, integrate to accomplish business goals "¢ Provide in-depth analysis with interpretive thinking to define issues and develop innovative solutions "¢ Serve as advisor or coach to mid-level developers and analysts, allocating work as necessary "¢ Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency. Minimum Skills Required"¢ 6-10 years of relevant experience in Apps Development or systems analysis role "¢ Extensive experience system analysis and in programming of software applications "¢ Application Development using JAVA, Scala, Spark "¢ Familiarity with event driven applications and streaming data "¢ Experience with Confluent Kafka, HDFS, HIVE, structured and unstructured database systems (SQL and NoSQL) "¢ Experience with various schema and data types -> JSON, AVRO, Parquet, etc. "¢ Experience with various ELT methodologies and formats -> JDBC, ODBC, API, Web hook, SFTP, etc. "¢ Experience working within the Agile and version control tool sets (JIRA, Bitbucket, Git, etc.) "¢ Ability to adjust priorities quickly as circumstances dictate "¢ Demonstrated leadership and project management skills "¢ Consistently demonstrates clear and concise written and verbal communication

Posted 1 month ago

Apply

5.0 - 8.0 years

0 - 1 Lacs

Hyderabad

Hybrid

Location: Hyderabad (Hybrid) Please share your resume with +91 9361912009 Roles and Responsibilities Deep understanding of Linux, networking and security fundamentals. Experience working with AWS cloud platform and infrastructure. Experience working with infrastructure as code with Terraform or Ansible tools. Experience managing large BigData clusters in production (at least one of -- Cloudera, Hortonworks, EMR). Excellent knowledge and solid work experience providing observability for BigData platforms using tools like Prometheus, InfluxDB, Dynatrace, Grafana, Splunk etc. Expert knowledge on Hadoop Distributed File System (HDFS) and Hadoop YARN. Decent knowledge of various Hadoop file formats like ORC, Parquet, Avro etc. Deep understanding of Hive (Tez), Hive LLAP, Presto and Spark compute engines. Ability to understand query plans and optimize performance for complex SQL queries on Hive and Spark. Experience supporting Spark with Python (PySpark) and R (SparklyR, SparkR) languages Solid professional coding experience with at least one scripting language - Shell, Python etc. Experience working with Data Analysts, Data Scientists and at least one of these related analytical applications like SAS, R-Studio, JupyterHub, H2O etc. Able to read and understand code (Java, Python, R, Scala), but expertise in at least one scripting languages like Python or Shell. Nice to have skills: Experience with workflow management tools like Airflow, Oozie etc. Knowledge in analytical libraries like Pandas, Numpy, Scipy, PyTorch etc. Implementation history of Packer, Chef, Jenkins or any other similar tooling. Prior working knowledge of Active Directory and Windows OS based VDI platforms like Citrix, AWS Workspaces etc.

Posted 1 month ago

Apply

3.0 - 8.0 years

4 - 8 Lacs

Chennai

Work from Office

Your Profile As a senior software engineer with Capgemini, you will have 3 + years of experience in Scala with strong project track record Hands On experience in Scala/Spark developer Hands on SQL writing skills on RDBMS (DB2) databases Experience in working with different file formats like JSON, Parquet, AVRO, ORC and XML. Must have worked in a HDFS platform development project. Proficiency in data analysis, data profiling, and data lineage Strong oral and written communication skills Experience working in Agile projects. Your Role Work on Hadoop, Spark, Hive &SQL query Ability to perform code optimization for performance, Scalability and configurability Data application development at scale in the Hadoop ecosystem. What youll love about working here ChoosingCapgeminimeans having the opportunity to make a difference, whetherfor the worlds leading businesses or for society. It means getting the support youneed to shape your career in the way that works for you. It means when the futuredoesnt look as bright as youd like, youhave the opportunity tomake changetorewrite it. When you join Capgemini, you dont just start a new job. You become part of something bigger. A diverse collective of free-thinkers, entrepreneurs and experts, all working together to unleash human energy through technology, for an inclusive and sustainable future. At Capgemini, people are at the heart of everything we do! You can exponentially grow your career by being part of innovative projects and taking advantage of our extensiveLearning & Developmentprograms. With us, you will experience aninclusive, safe, healthy, andflexiblework environment to bring out the best in you! You also get a chance to make positive social change and build a better world by taking an active role in ourCorporate Social ResponsibilityandSustainabilityinitiatives. And whilst you make a difference, you will also have a lot offun. About Company

Posted 1 month ago

Apply

4.0 - 9.0 years

2 - 6 Lacs

Bengaluru

Work from Office

Roles and Responsibilities: 4+ years of experience as a data developer using Python Knowledge in Spark, PySpark preferable but not mandatory Azure Cloud experience (preferred) Alternate Cloud experience is fine preferred experience in Azure platform including Azure data Lake, data Bricks, data Factory Working Knowledge on different file formats such as JSON, Parquet, CSV, etc. Familiarity with data encryption, data masking Database experience in SQL Server is preferable preferred experience in NoSQL databases like MongoDB Team player, reliable, self-motivated, and self-disciplined

Posted 1 month ago

Apply

8.0 - 10.0 years

10 - 12 Lacs

Hyderabad

Work from Office

ABOUT THE ROLE Role Description: We are seeking a highly skilled and experienced hands-on Test Automation Engineering Manager with a deep e xpertise in Data Quality (DQ) , Data Integration (DIF) , and Data Governance . In this role, you will design and implement automated frameworks that ensure data accuracy, metadata consistency , and compliance throughout the data pipeline , leveraging technologies like Data bricks , AWS , and cloud-native tools . You will have a major focus on Data Cataloguing and Governance , ensuring that data assets are well-documented, auditable, and secure across the enterprise. In this role, you will be responsible for the end-to-end design and development of a test automation framework, working collaboratively with the team. As the delivery owner for test automation, your primary focus will be on building and automating comprehensive validation frameworks for data cataloging , data classification, and metadata tracking, while ensuring alignment with internal governance standards. will also work closely with data engineers, product teams, and data governance leads to enforce data quality and governance policies . Your efforts will play a key role in driving data integrity, consistency, and trust across the organization. The role is highly technical and hands-on , with a strong focus on automation, metadata validation , and ensuring data governance practices are seamlessly integrated into development pipelines. Roles & Responsibilities: Data Quality & Integration Frameworks Design and implement Data Quality (DQ) frameworks that validate schema compliance, transformations, completeness, null checks, duplicates, threshold rules, and referential integrity. Build Data Integration Frameworks (DIF) that validate end-to-end data pipelines across ingestion, processing, storage, and consumption layers. Automate data validations in Databricks/Spark pipelines, integrated with AWS services like S3, Glue, Athena, and Lake Formation. Develop modular, reusable validation components using PySpark, SQL, Python, and orchestration via CI/CD pipelines. Data Cataloging & Governance Integrate automated validations with AWS Glue Data Catalog to ensure metadata consistency, schema versioning, and lineage tracking. Implement checks to verify that data assets are properly cataloged, discoverable, and compliant with internal governance standards. Validate and enforce data classification, tagging, and access controls, ensuring alignment with data governance frameworks (e.g., PII/PHI tagging, role-based access policies). Collaborate with governance teams to automate policy enforcement and compliance checks for audit and regulatory needs. Visualization & UI Testing Automate validation of data visualizations in tools like Tableau, Power BI, Looker , or custom React dashboards. Ensure charts, KPIs, filters, and dynamic views correctly reflect backend data using UI automation (Selenium with Python) and backend validation logic. Conduct API testing (via Postman or Python test suites) to ensure accurate data delivery to visualization layers. Technical Skills and Tools Hands-on experience with data automation tools like Databricks and AWS is essential, as the manager will be instrumental in building and managing data pipelines. Leverage automated testing frameworks and containerization tools to streamline processes and improve efficiency. Experience in UI and API functional validation using tools such as Selenium with Python and Postman, ensuring comprehensive testing coverage. Technical Leadership, Strategy & Team Collaboration Define and drive the overall QA and testing strategy for UI and search-related components with a focus on scalability, reliability, and performance, while establishing alerting and reporting mechanisms for test failures, data anomalies, and governance violations. Contribute to system architecture and design discussions , bringing a strong quality and testability lens early into the development lifecycle. Lead test automation initiatives by implementing best practices and scalable frameworks, embedding test suites into CI/CD pipelines to enable automated, continuous validation of data workflows, catalog changes, and visualization updates Mentor and guide QA engineers , fostering a collaborative, growth-oriented culture focused on continuous learning and technical excellence. Collaborate cross-functionally with product managers, developers, and DevOps to align quality efforts with business goals and release timelines. Conduct code reviews, test plan reviews, and pair-testing sessions to ensure team-level consistency and high-quality standards. Good-to-Have Skills: Experience with data governance tools such as Apache Atlas , Collibra , or Alation Understanding of DataOps methodologies and practices Familiarity with monitoring/observability tools such as Datadog , Prometheus , or CloudWatch Experience building or maintaining test data generators Contributions to internal quality dashboards or data observability systems Awareness of metadata-driven testing approaches and lineage-based validations Experience working with agile Testing methodologies such as Scaled Agile. Familiarity with automated testing frameworks like Selenium, JUnit, TestNG, or PyTest. Must-Have Skills: Strong hands-on experience with Data Quality (DQ) framework design and automation Expertise in PySpark, Python, and SQL for data validations Solid understanding of ETL/ELT pipeline testing in Databricks or Apache Spark environments Experience validating structured and semi-structured data formats (e.g., Parquet, JSON, Avro) Deep familiarity with AWS data services: S3, Glue, Athena, Lake Formation, Data Catalog Integration of test automation with AWS Glue Data Catalog or similar catalog tools UI automation using Selenium with Python for dashboard and web interface validation API testing using Postman, Python, or custom API test scripts Hands-on testing of BI tools such as Tableau, Power BI, Looker, or custom visualization layers CI/CD test integration with tools like Jenkins, GitHub Actions, or GitLab CI Familiarity with containerized environments (e.g., Docker, AWS ECS/EKS) Knowledge of data classification, access control validation, and PII/PHI tagging Understanding of data governance standards (e.g., GDPR, HIPAA, CCPA) Understanding Data Structures: Knowledge of various data structures and their applications. Ability to analyze data and identify inconsistencies. Proven hands-on experience in test automation and data automation using Databricks and AWS. Strong knowledge of Data Integrity Frameworks (DIF) and Data Quality (DQ) principles. Familiarity with automated testing frameworks like Selenium, JUnit, TestNG, or PyTest. Strong understanding of data transformation techniques and logic. Education and Professional Certifications Bachelors degree in computer science and engineering preferred, other Engineering field is considered; Masters degree and 6+ years experience Or Bachelors degree and 8+ years Soft Skills: Excellent analytical and troubleshooting skills. Strong verbal and written communication skills Ability to work effectively with global, virtual teams High degree of initiative and self-motivation. Ability to manage multiple priorities successfully. Team-oriented, with a focus on achieving team goals Strong presentation and public speaking skills.

Posted 1 month ago

Apply

5.0 - 8.0 years

2 - 6 Lacs

Bengaluru

Work from Office

Job Information Job Opening ID ZR_1628_JOB Date Opened 09/12/2022 Industry Technology Job Type Work Experience 5-8 years Job Title Data Engineer City Bangalore Province Karnataka Country India Postal Code 560001 Number of Positions 4 Roles and Responsibilities: 4+ years of experience as a data developer using Python Knowledge in Spark, PySpark preferable but not mandatory Azure Cloud experience (preferred) Alternate Cloud experience is fine preferred experience in Azure platform including Azure data Lake, data Bricks, data Factory Working Knowledge on different file formats such as JSON, Parquet, CSV, etc. Familiarity with data encryption, data masking Database experience in SQL Server is preferable preferred experience in NoSQL databases like MongoDB Team player, reliable, self-motivated, and self-disciplined check(event) ; career-website-detail-template-2 => apply(record.id,meta)" mousedown="lyte-button => check(event)" final-style="background-color:#2B39C2;border-color:#2B39C2;color:white;" final-class="lyte-button lyteBackgroundColorBtn lyteSuccess" lyte-rendered=""> I'm interested

Posted 1 month ago

Apply

8.0 - 13.0 years

30 - 35 Lacs

Bengaluru

Work from Office

Data Engineer Location: Bangalore - Onsite Experience: 8 - 15 years Type: Full-time Role Overview We are seeking an experienced Data Engineer to build and maintain scalable, high-performance data pipelines and infrastructure for our next-generation data platform. The platform ingests and processes real-time and historical data from diverse industrial sources such as airport systems, sensors, cameras, and APIs. You will work closely with AI/ML engineers, data scientists, and DevOps to enable reliable analytics, forecasting, and anomaly detection use cases. Key Responsibilities Design and implement real-time (Kafka, Spark/Flink) and batch (Airflow, Spark) pipelines for high-throughput data ingestion, processing, and transformation. Develop data models and manage data lakes and warehouses (Delta Lake, Iceberg, etc) to support both analytical and ML workloads. Integrate data from diverse sources: IoT sensors, databases (SQL/NoSQL), REST APIs, and flat files. Ensure pipeline scalability, observability, and data quality through monitoring, alerting, validation, and lineage tracking. Collaborate with AI/ML teams to provision clean and ML-ready datasets for training and inference. Deploy, optimize, and manage pipelines and data infrastructure across on-premise and hybrid environments. Participate in architectural decisions to ensure resilient, cost-effective, and secure data flows. Contribute to infrastructure-as-code and automation for data deployment using Terraform, Ansible, or similar tools. Qualifications & Required Skills Bachelors or Master’s in Computer Science, Engineering, or related field. 6+ years in data engineering roles, with at least 2 years handling real-time or streaming pipelines. Strong programming skills in Python/Java and SQL. Experience with Apache Kafka, Apache Spark, or Apache Flink for real-time and batch processing. Hands-on with Airflow, dbt, or other orchestration tools. Familiarity with data modeling (OLAP/OLTP), schema evolution, and format handling (Parquet, Avro, ORC). Experience with hybrid/on-prem and cloud platforms (AWS/GCP/Azure) deployments. Proficient in working with data lakes/warehouses like Snowflake, BigQuery, Redshift, or Delta Lake. Knowledge of DevOps practices, Docker/Kubernetes, Terraform or Ansible. Exposure to data observability, data cataloging, and quality tools (e.g., Great Expectations, OpenMetadata). Good-to-Have Experience with time-series databases (e.g., InfluxDB, TimescaleDB) and sensor data. Prior experience in domains such as aviation, manufacturing, or logistics is a plus. Role & responsibilities Preferred candidate profile

Posted 1 month ago

Apply

4.0 - 8.0 years

4 - 8 Lacs

Gurugram

Work from Office

Capgemini Invent Capgemini Invent is the digital innovation, consulting and transformation brand of the Capgemini Group, a global business line that combines market leading expertise in strategy, technology, data science and creative design, to help CxOs envision and build whats next for their businesses. Your Role Proficiency in MS Fabric,Azure Data Factory, Azure Synapse Analytics, Azure Databricks Extensive knowledge of MS Fabriccomponents Lakehouses, OneLake, Data Pipelines, Real-Time Analytics, Power BI Integration, Semantic Model. Integrate Fabric capabilities for seamless data flow, governance, and collaborationacross teams. Strong understanding of Delta Lake, Parquet, and distributed data systems. Strong programming skills in Python, PySpark,Scalaor SparkSQL/TSQLfor data transformations. Your Profile Strong experience in implementation and management of lake House using Databricks and Azure Tech stack (ADLS Gen2, ADF, Azure SQL) . Proficiencyin data integration techniques, ETL processes and data pipeline architectures. Understanding of Machine Learning Algorithms & AI/ML frameworks (i.e TensorFlow, PyTorch)and Power BIis an added advantage MS Fabric and PySpark is must. What you will love about working here We recognize the significance of flexible work arrangements to provide support. Be it remote work, or flexible work hours, you will get an environment to maintain healthy work life balance. At the heart of our mission is your career growth. Our array of career growth programs and diverse professions are crafted to support you in exploring a world of opportunities. Equip yourself with valuable certifications in the latest technologies such as Generative AI. About Capgemini Capgemini is a global business and technology transformation partner, helping organizations to accelerate their dual transition to a digital and sustainable world, while creating tangible impact for enterprises and society. It is a responsible and diverse group of 340,000 team members in more than 50 countries. With its strong over 55-year heritage, Capgemini is trusted by its clients to unlock the value of technology to address the entire breadth of their business needs. It delivers end-to-end services and solutions leveraging strengths from strategy and design to engineering, all fueled by its market leading capabilities in AI, cloud and data, combined with its deep industry expertise and partner ecosystem. The Group reported 2023 global revenues of 22.5 billion.

Posted 2 months ago

Apply

4.0 - 9.0 years

4 - 8 Lacs

Hyderabad

Work from Office

Data Transformation: Utilize Data Build Tool (dbt) to transform raw data into curated data models according to business requirements. Implement data transformations and aggregations to support analytical and reporting needs. Orchestration and Automation: Design and implement automated workflows using Google Cloud Composer to orchestrate data pipelines and ensure timely data delivery. Monitor and troubleshoot data pipelines, identifying and resolving issues proactively. Develop and maintain documentation for data pipelines and workflows. GCP Expertise: Leverage GCP services, including BigQuery, Cloud Storage, and Pub/Sub, to build a robust and scalable data platform. Optimize BigQuery performance and cost through efficient query design and data partitioning. Implement data security and access controls in accordance with banking industry standards. Collaboration and Communication: Collaborate with Solution Architect and Data Modeler to understand data requirements and translate them into technical solutions. Communicate effectively with team members and stakeholders, providing regular updates on project progress. Participate in code reviews and contribute to the development of best practices. Data Pipeline Development: Design, develop, and maintain scalable and efficient data pipelines using Google Cloud Dataflow to ingest data from various sources, including relational databases (RDBMS), data streams, and files. Implement data quality checks and validation processes to ensure data accuracy and consistency. Optimize data pipelines for performance and cost-effectiveness. Banking Domain Knowledge (Preferred): Understanding of banking data domains, such as customer data, transactions, and financial products. Familiarity with regulatory requirements and data governance standards in the banking industry. Required Experience: Bachelor's degree in computer science, Engineering, or a related field. ETL Knowledge. 4-9 years of experience in data engineering, with a focus on building data pipelines and data transformations. Strong proficiency in SQL and experience working with relational databases. Hands-on experience with Google Cloud Platform (GCP) services, including Dataflow, BigQuery, Cloud Composer, and Cloud Storage. Experience with data transformation tools, preferably Data Build Tool (dbt). Proficiency in Python or other scripting languages is a plus. Experience with data orchestration and automation. Strong problem-solving and analytical skills. Excellent communication and collaboration skills. Experience with data streams like Pub/Sub or similar. Experience in working with files such as CSV, JSON and Parquet. Primary Skills: GCP, Dataflow, BigQuery, Cloud Composer, Cloud Storage, Data Pipeline, Composer, SQL, DBT, DWH Concepts. Secondary Skills: Python, Banking Domain knowledge, pub/sub, Cloud certifications (e.g. Data engineer), Git or any other version control system.

Posted 2 months ago

Apply

4.0 - 7.0 years

6 - 9 Lacs

Bengaluru

Work from Office

What this job involves: JLL, an international real estate management company, is seeking an Data Engineer to join our JLL Technologies Team. We are seeking candidates that are self-starters to work in a diverse and fast-paced environment that can join our Enterprise Data team. We are looking for a candidate that is responsible for designing and developing of data solutions that are strategic for the business using the latest technologies Azure Databricks, Python, PySpark, SparkSQL, Azure functions, Delta Lake, Azure DevOps CI/CD. Responsibilities Design, Architect, and Develop solutions leveraging cloud big data technology to ingest, process and analyze large, disparate data sets to exceed business requirements. Design & develop data management and data persistence solutions for application use cases leveraging relational, non-relational databases and enhancing our data processing capabilities. Develop POCs to influence platform architects, product managers and software engineers to validate solution proposals and migrate. Develop data lake solution to store structured and unstructured data from internal and external sources and provide technical guidance to help migrate colleagues to modern technology platform. Contribute and adhere to CI/CD processes, development best practices and strengthen the discipline in Data Engineering Org. Develop systems that ingest, cleanse and normalize diverse datasets, develop data pipelines from various internal and external sources and build structure for previously unstructured data. Using PySpark and Spark SQL, extract, manipulate, and transform data from various sources, such as databases, data lakes, APIs, and files, to prepare it for analysis and modeling. Build and optimize ETL workflows using Azure Databricks and PySpark. This includes developing efficient data processing pipelines, data validation, error handling, and performance tuning. Perform the unit testing, system integration testing, regression testing and assist with user acceptance testing. Articulates business requirements in a technical solution that can be designed and engineered. Consults with the business to develop documentation and communication materials to ensure accurate usage and interpretation of JLL data. Implement data security best practices, including data encryption, access controls, and compliance with data protection regulations. Ensure data privacy, confidentiality, and integrity throughout the data engineering processes. Performs data analysis required to troubleshoot data related issues and assist in the resolution of data issues. Experience & Education Minimum of 4 years of experience as a data developer using Python, PySpark, Spark Sql, ETL knowledge, SQL Server, ETL Concepts. Bachelors degree in Information Science, Computer Science, Mathematics, Statistics or a quantitative discipline in science, business, or social science. Experience in Azure Cloud Platform, Databricks, Azure storage. Effective written and verbal communication skills, including technical writing. Excellent technical, analytical and organizational skills. Technical Skills & Competencies Experience handling un-structured, semi-structured data, working in a data lake environment, leveraging data streaming and developing data pipelines driven by events/queues Hands on Experience and knowledge on real time/near real time processing and ready to code Hands on Experience in PySpark, Databricks, and Spark Sql. Knowledge on json, Parquet and Other file format and work effectively with them No Sql Databases Knowledge like Hbase, Mongo, Cosmos etc. Preferred Cloud Experience on Azure or AWS Python-spark, Spark Streaming, Azure SQL Server, Cosmos DB/Mongo DB, Azure Event Hubs, Azure Data Lake Storage, Azure Search etc. Team player, Reliable, self-motivated, and self-disciplined individual capable of executing on multiple projects simultaneously within a fast-paced environment working with cross functional teams.

Posted 2 months ago

Apply

3.0 - 5.0 years

50 - 60 Lacs

Bengaluru

Work from Office

Staff Data Engineer Experience: 3 - 5 Years Exp Salary : INR 50-60 Lacs per annum Preferred Notice Period : Within 30 Days Shift : 4:00PM to 1:00AM IST Opportunity Type: Remote Placement Type: Permanent (*Note: This is a requirement for one of Uplers' Clients) Must have skills required : ClickHouse, DuckDB, AWS, Python, SQL Good to have skills : DBT, Iceberg, Kestra, Parquet, SQLGlot Rill Data (One of Uplers' Clients) is Looking for: Staff Data Engineer who is passionate about their work, eager to learn and grow, and who is committed to delivering exceptional results. If you are a team player, with a positive attitude and a desire to make a difference, then we want to hear from you. Role Overview Description Rill is the worlds fastest BI tool, designed from the ground up for real-time databases like DuckDB and ClickHouse. Our platform combines last-mile ETL, an in-memory database, and interactive dashboards into a full-stack solution thats easy to deploy and manage. With a BI-as-code approach, Rill empowers developers to define and collaborate on metrics using SQL and YAML. Trusted by leading companies in e-commerce, digital marketing, and financial services, Rill provides the speed and scalability needed for operational analytics and partner-facing reporting. Job Summary Overview Rill is looking for a Staff Data Engineer to join our Field Engineering team. In this role, you will work closely with enterprise customers to design and optimize high-performance data pipelines powered by DuckDB and ClickHouse. You will also collaborate with our platform engineering team to evolve our incremental ingestion architectures and support proof-of-concept sales engagements. The ideal candidate has strong SQL fluency, experience with orchestration frameworks (e.g., Kestra, dbt, SQLGlot), familiarity with data lake table formats (e.g., Iceberg, Parquet), and an understanding of cloud databases (e.g., Snowflake, BigQuery). Most importantly, you should have a passion for solving real-world data engineering challenges at scale. Key Responsibilities Collaborate with enterprise customers to optimize data models for performance and cost efficiency. Work with the platform engineering team to enhance and refine our incremental ingestion architectures. Partner with account executives and solution architects to rapidly prototype solutions for proof-of-concept sales engagements. Qualifications (required) Fluency in SQL and competency in Python. Bachelors degree in a STEM discipline or equivalent industry experience. 3+ years of experience in a data engineering or related role. Familiarity with major cloud environments (AWS, Google Cloud, Azure) Benefits Competitive salary Health insurance Flexible vacation policy How to apply for this opportunity: Easy 3-Step Process: 1. Click On Apply! And Register or log in on our portal 2. Upload updated Resume & Complete the Screening Form 3. Increase your chances to get shortlisted & meet the client for the Interview! About Our Client: Rill is an operational BI tool that provides fast dashboards that your team will actually use. Data teams build fewer, more flexible dashboards for business users, while business users make faster decisions and perform root cause analysis, with fewer ad hoc requests. Rills unique architecture combines a last-mile ETL service, an in-memory database, and operational dashboards - all in a single solution. Our customers are leading media & advertising platforms, including Comcast's Freewheel, tvScientific, AT&T's DishTV, and more. About Uplers: Our goal is to make hiring and getting hired reliable, simple, and fast. Our role will be to help all our talents find and apply for relevant product and engineering job opportunities and progress in their career. (Note: There are many more opportunities apart from this on the portal.) So, if you are ready for a new challenge, a great work environment, and an opportunity to take your career to the next level, don't hesitate to apply today. We are waiting for you!

Posted 2 months ago

Apply

5.0 - 10.0 years

8 - 18 Lacs

Hyderabad

Work from Office

Position Title: Senior Software Engineer Job Location: Hyderabad Education: CSE, ECE, IT, EEE Essential: Python, Vue JS, JavaScript, PostgreSQL Desired: Deployment, Gitlab, Linux Knowledge: AWS, Docker, ETL, Key cloak Experience: 4 to 8 Years: Experience in building and deploying web applications using the Python & Vue JS, React eco system. Experience in Deployment process of web servers (Django) & Vue JS, React JS using Nginx or Apache. Summary of work Environment and Work performed: Develop and maintain web-based applications (including Mobile Web) using mainly Python & JavaScript programming language and Django & Vue JS, React JS Frameworks. Specific Duties: Full stack developer who can understand and write Reusable, Testable and Efficient code using Python & JavaScript languages. Responsible for regular communication with others involved in the development process. Design and implementation of low-latency, high-availability, and performance applications. Desired Outcome: Ability to Design Performant Applications with Testable and Reusable code. Ability to Adapt different programming languages or framework based on requirement. Able to Participate in code reviews and design reviews. Skills and abilities: Python Django, Django Rest framework. JavaScript Vue JS, React JS Quasar Framework. Databases PostgreSQL, Redis, MSSQL, Influx, ETL. Deployment Nginx, Apache, Linux 5. Source Control Git, GitLab, CI/CD Best Regards, Kodandapani.D Executive-HR Dept:Human Resource 8341119158 Kodandapani.dabbugunta@tmeic.in Kodandapaani Yadav Dabbugunta | LinkedIn TMEIC Industrial Systems India Pvt. Ltd. Group company of TMEIC Corporation, Japan. Address: Unit No. 06-01, Level-6, Block-2, Cyber Pearl, HITEC City, Madhapur, Hyderabad 500081, Telangana. Website: www.tmeic.com

Posted 2 months ago

Apply

5.0 - 10.0 years

20 - 35 Lacs

Bengaluru

Work from Office

Job Title: Senior Data Engineer ML & Azure Platform Location: Bangalore Experience: 5 - 10 years Joining Timeframe: Only candidates who can join within 1 month will be considered. Job Description: We are seeking a skilled Senior Data Engineer to work on end-to-end data engineering and data science use cases. The ideal candidate will have strong expertise in Python or Scala, Spark (Databricks), and SQL, and experience building scalable and efficient data pipelines on Azure. Primary Skills: Azure Data Platform Data Factory, Databricks Strong experience in SQL and Python or Scala Experience with ETL/ELT pipelines and transformations Knowledge of Spark , Delta Lake , Parquet , and Big Data technologies Familiarity with MLOps , CI/CD pipelines, model monitoring, versioning Performance tuning and pipeline optimization Data quality checks and feature engineering Nice-to-Have Skills: Exposure to NLP , time-series forecasting , anomaly detection Knowledge of data governance frameworks Understanding of retail or workforce analytics domains Note: Please apply only if you're available to join within 1 month. To Apply: Kindly share your updated resume , current CTC , expected CTC and notice period to vijay.s@xebia.com.

Posted 2 months ago

Apply

5.0 - 10.0 years

1 - 5 Lacs

Bengaluru

Work from Office

Job Title:AWS Data Engineer Experience5-10 Years Location:Bangalore : Technical Skills: 5 + Years of experience as AWS Data Engineer, AWS S3, Glue Catalog, Glue Crawler, Glue ETL, Athena write Glue ETLs to convert data in AWS RDS for SQL Server and Oracle DB to Parquet format in S3 Execute Glue crawlers to catalog S3 files. Create catalog of S3 files for easier querying Create SQL queries in Athena Define data lifecycle management for S3 files Strong experience in developing, debugging, and optimizing Glue ETL jobs using PySpark or Glue Studio. Ability to connect Glue ETLs with AWS RDS (SQL Server and Oracle) for data extraction and write transformed data into Parquet format in S3. Proficiency in setting up and managing Glue Crawlers to catalog data in S3. Deep understanding of S3 architecture and best practices for storing large datasets. Experience in partitioning and organizing data for efficient querying in S3. Knowledge of Parquet file format advantages for optimized storage and querying. Expertise in creating and managing the AWS Glue Data Catalog to enable structured and schema-aware querying of data in S3. Experience with Amazon Athena for writing complex SQL queries and optimizing query performance. Familiarity with creating views or transformations in Athena for business use cases. Knowledge of securing data in S3 using IAM policies, S3 bucket policies, and KMS encryption. Understanding of regulatory requirements (e.g., GDPR) and implementing secure data handling practices. Non-Technical Skills: Candidate needs to be Good Team Player Effective interpersonal, team building and communication skills. Ability to communicate complex technology to no tech audience in simple and precise manner.

Posted 2 months ago

Apply

3.0 - 7.0 years

6 - 10 Lacs

Mumbai

Work from Office

Role Overview : Looking for a Kafka SME to design and support real-time data ingestion pipelines using Kafka within a Cloudera-based Lakehouse architecture. Key Responsibilities : Design Kafka topics, partitions, schema registry Implement producer-consumer apps using Spark Structured Streaming Set up Kafka Connect, monitoring, and alerts Ensure secure, scalable message delivery Required education Bachelor's Degree Preferred education Master's Degree Required technical and professional expertise Skills Required : Deep understanding of Kafka internals and ecosystem Integration with Cloudera and NiFi Schema evolution and serialization (Avro, Parquet) Performance tuning and fault-tolerance Preferred technical and professional experience Good communication skill. India market experience is preferred.

Posted 2 months ago

Apply

5.0 - 8.0 years

7 - 10 Lacs

Mumbai

Work from Office

So, whats the job? You'll lead the design, development, and optimization of scalable, maintainable, and high-performance ETL/ELT pipelines using Informatica IDMC CDI. You'll manage and optimize cloud-based storage environments, including AWS S3 buckets. You'll implement robust data integration solutions that ingest, cleanse, transform, and deliver structured and semi-structured data from diverse sources to downstream systems and data warehouses. You'll support data integration from source systems, ensuring data quality and completeness. You'll automate data loading and transformation processes using tools such as Python, SQL, and orchestration frameworks. You'll contribute to the strategic transition toward cloud-native data platforms (e.g., AWS S3, Snowflake) by designing hybrid or fully cloud-based data solutions. You'll collaborate with Data Architects to align data models and structures with enterprise standards. You'll maintain clear documentation of data pipelines, processes, and technical standards, and mentor team members in best practices and tool usage. You'll implement and enforce data security, access controls, and compliance measures in line with organizational policies. And what are we looking for? You'll have a Bachelors degree in Computer Science, Engineering, or a related field with a minimum of 5 years of industry experience. You'll be an expert in designing, developing, and optimizing ETL/ELT pipelines using Informatica IDMC Cloud Data Integration (CDI). You'll bring strong experience with data ingestion, transformation, and delivery across diverse data sources and targets. You'll have a deep understanding of data integration patterns, orchestration strategies, and data pipeline lifecycle management. You'll be proficient in implementing incremental loads, CDC (Change Data Capture), and data synchronization. You'll bring strong experience with SQL Server, including performance tuning, stored procedures, and indexing strategies. You'll possess a solid understanding of data modeling, data warehousing concepts (star/snowflake schema), and dimensional modeling. You'll have experience integrating with cloud data warehouses such as Snowflake. You'll be familiar with cloud storage and compute platforms such as AWS S3, EC2, Lambda, Glue, and RDS. You'll design and implement cloud-native data architectures using modern tools and best practices. You'll have exposure to data migration and hybrid architecture design (on-prem to cloud). You'll be experienced with Informatica Intelligent Cloud Services (IICS), especially IDMC CDI. You'll have strong proficiency in SQL, T-SQL, and scripting languages like Python or Shell. You'll have experience with workflow orchestration tools like Apache Airflow, Informatica task flows, or Control-M. You'll be knowledgeable in API integration, REST/SOAP, and file-based data exchange (e.g., SFTP, CSV, Parquet). You'll implement data validation, error handling, and data quality frameworks. You'll have an understanding of data lineage, metadata management, and governance best practices. You'll set up monitoring, logging, and alerting for ETL processes.

Posted 2 months ago

Apply

10.0 - 15.0 years

25 - 35 Lacs

Pune

Work from Office

Education and Qualifications • Bachelors degree in IT, Computer Science, Software Engineering, Business Analytics or equivalent. Work Experience • Minimum 10 years of experience in data analytics field Minimum 6 years of experience in running operation and support in Cloud Data Lakehouse environment Experience with Azure Databricks Experience in building and optimizing data pipelines, architectures and data sets Excellent experience in Scala or Python Ability to troubleshoot and optimize complex queries on the Spark platform Knowledgeable on structured and unstructured data design / modeling, data access and data storage techniques Experience with DevOps tools and environment Technical / Professional Skills Please provide at least 3 • Azure Databricks Python / Scala / Java HIVE / HBase / Impala / Parquet Sqoop, Kafka, Flume SQL and RDBMS Airflow Jenkins / Bamboo Github / Bitbucket Nexus Have you worked in sizing clusters for Databricks in Azure cloud environment? Have you done hand-on configuration and administration of Databricks platform on Azure Cloud? Have you experience in cluster management, storage management, workspace management, key management etc? Have you done cost optimization exercises to reduce the consumption cost of Databricks clusters? Have you done cost forecasting of Databricks platform on Azure Cloud? How you do monitor cost anomaly, identify cost driver and come up with recommendation? Have you done any RBAC configuration in Databricks platform on Azure Cloud? Have you configured connectivity from Databricks to internal/external sources/applications such as Power BI, Google Analytics, SharePoint etc What have you implemented/how do you monitor the health of Databricks Platform, its services, the health of ETL pipeline and the end-points What kind of proactive or self-healing process are put in place to ensure service availability?

Posted 2 months ago

Apply

9 - 11 years

37 - 40 Lacs

Ahmedabad, Bengaluru, Mumbai (All Areas)

Work from Office

Dear Candidate, We are hiring a Scala Developer to work on high-performance distributed systems, leveraging the power of functional and object-oriented paradigms. This role is perfect for engineers passionate about clean code, concurrency, and big data pipelines. Key Responsibilities: Build scalable backend services using Scala and the Play or Akka frameworks . Write concurrent and reactive code for high-throughput applications . Integrate with Kafka, Spark, or Hadoop for data processing. Ensure code quality through unit tests and property-based testing . Work with microservices, APIs, and cloud-native deployments. Required Skills & Qualifications: Proficient in Scala , with a strong grasp of functional programming Experience with Akka, Play, or Cats Familiarity with Big Data tools and RESTful API development Bonus: Experience with ZIO, Monix, or Slick Soft Skills: Strong troubleshooting and problem-solving skills. Ability to work independently and in a team. Excellent communication and documentation skills. Note: If interested, please share your updated resume and preferred time for a discussion. If shortlisted, our HR team will contact you. Kandi Srinivasa Reddy Delivery Manager Integra Technologies

Posted 2 months ago

Apply
Page 3 of 3
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies