Home
Jobs

3921 Pyspark Jobs - Page 25

Filter Interviews
Min: 0 years
Max: 25 years
Min: ₹0
Max: ₹10000000
Setup a job Alert
Filter
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

2.0 years

5 - 8 Lacs

Hyderābād

On-site

GlassDoor logo

At CGI, we’re a team of builders. We call our employees members because all who join CGI are building their own company - one that has grown to 72,000 professionals located in 40 countries. Founded in 1976, CGI is a leading IT and business process services firm committed to helping clients succeed. We have the global resources, expertise, stability and dedicated professionals needed to achieve. At CGI, we’re a team of builders. We call our employees members because all who join CGI are building their own company - one that has grown to 72,000 professionals located in 40 countries. Founded in 1976, CGI is a leading IT and business process services firm committed to helping clients succeed. We have the global resources, expertise, stability and dedicated professionals needed to achieve results for our clients - and for our members. Come grow with us. Learn more at www.cgi.com. This is a great opportunity to join a winning team. CGI offers a competitive compensation package with opportunities for growth and professional development. Benefits for full-time, permanent members start on the first day of employment and include a paid time-off program and profit participation and stock purchase plans. We wish to thank all applicants for their interest and effort in applying for this position, however, only candidates selected for interviews will be contacted. No unsolicited agency referrals please. Job Title: Python with SQL Position: Software Engineer Experience: 2-3Years Category: Software Development/ Engineering Location: Hyderabad , Chennai , Bangalore Employment Type: Full Time Your future duties and responsibilities We are seeking a highly skilled and detail-oriented Python and SQL Developer to join our team. The ideal candidate will be responsible for developing and maintaining data-driven applications, building efficient and scalable solutions, and working with databases to extract and manipulate data for analysis and reporting. Design, develop, and maintain scalable Python applications and microservices. Write complex and optimized SQL queries for data extraction, transformation, and loading (ETL). Develop and automate data pipelines integrating various data sources (REST APIs, files, databases). Work with large datasets in relational databases such as PostgreSQL, MySQL, or SQL Server. Collaborate with data engineers, analysts, and product teams to build high-quality data solutions. Implement unit testing, logging, and error handling to ensure software reliability. Optimize database performance and troubleshoot query issues. Participate in architecture discussions and code reviews 2+ years of professional experience with Python and SQL in production environments. Deep understanding of Python core concepts including data structures, OOP, exception handling, and multi-threading. Experience with SQL query optimization, stored procedures, indexing, and partitioning. Strong experience with Python libraries such as Pandas, NumPy, SQLAlchemy, PySpark, or similar. Familiarity with ETL pipelines, data validation, and data integration. Experience with Git, CI/CD tools, and development best practices. Excellent problem-solving skills and ability to debug complex systems. Required qualifications to be successful in this role Experience with cloud platforms (AWS RDS, GCP BigQuery, Azure SQL, etc.). Exposure to Docker, Kubernetes, or serverless architectures. Understanding of data warehousing and business intelligence concepts. Prior experience working in Agile/Scrum environments. Years of experience : 2+ Relevant experience : 2+ Locations : Hyderabad ,Bangalore , Chennai. Eductaion : BTech ,MTech ,BSC Notice : Immediate to 30days - Serving Together, as owners, let’s turn meaningful insights into action. Life at CGI is rooted in ownership, teamwork, respect and belonging. Here, you’ll reach your full potential because… You are invited to be an owner from day 1 as we work together to bring our Dream to life. That’s why we call ourselves CGI Partners rather than employees. We benefit from our collective success and actively shape our company’s strategy and direction. Your work creates value. You’ll develop innovative solutions and build relationships with teammates and clients while accessing global capabilities to scale your ideas, embrace new opportunities, and benefit from expansive industry and technology expertise. You’ll shape your career by joining a company built to grow and last. You’ll be supported by leaders who care about your health and well-being and provide you with opportunities to deepen your skills and broaden your horizons. Come join our team—one of the largest IT and business consulting services firms in the world.

Posted 6 days ago

Apply

5.0 years

4 - 5 Lacs

Hyderābād

On-site

GlassDoor logo

Job Description: At least 5+ years’ of relevant hands on development experience as Azure Data Engineering role Proficient in Azure technologies like ADB, ADF, SQL(capability of writing complex SQL queries), ADB, PySpark, Python, Synapse, Delta Tables, Unity Catalog Hands on in Python, PySpark or Spark SQL Hands on in Azure Analytics and DevOps Taking part in Proof of Concepts (POCs) and pilot solutions preparation Ability to conduct data profiling, cataloguing, and mapping for technical design and construction of technical data flows Experience in business processing mapping of data and analytics solutions Recruitment fraud is a scheme in which fictitious job opportunities are offered to job seekers typically through online services, such as false websites, or through unsolicited emails claiming to be from the company. These emails may request recipients to provide personal information or to make payments as part of their illegitimate recruiting process. DXC does not make offers of employment via social media networks and DXC never asks for any money or payments from applicants at any point in the recruitment process, nor ask a job seeker to purchase IT or other equipment on our behalf. More information on employment scams is available here .

Posted 6 days ago

Apply

1.0 - 6.0 years

3 - 8 Lacs

Hyderabad

Work from Office

Naukri logo

We are seeking an MDM Associate Data Engineer with 2-- 5 years of experience to support and enhance our enterprise MDM (Master Data Management) platforms using Informatica/Reltio. This role is critical in delivering high-quality master data solutions across the organization, utilizing modern tools like Databricks and AWS to drive insights and ensure data reliability. The ideal candidate will have strong SQL, data profiling, and experience working with cross-functional teams in a pharma environment. To succeed in this role, the candidate must have strong data engineering experience along with MDM knowledge, hence the candidates having only MDM experience are not eligible for this role. Candidate must have data engineering experience on technologies like (SQL, Python, PySpark , Databricks, AWS etc ), along with knowledge of MDM (Master Data Management) Roles & Responsibilities: Analyze and manage customer master data using Reltio or Informatica MDM solutions. Perform advanced SQL queries and data analysis to validate and ensure master data integrity. Leverage Python, PySpark , and Databricks for scalable data processing and automation. Collaborate with business and data engineering teams for continuous improvement in MDM solutions. Implement data stewardship processes and workflows, including approval and DCR mechanisms. Utilize AWS cloud services for data storage and compute processes related to MDM. Contribute to metadata and data modeling activities. Track and manage data issues using tools such as JIRA and document processes in Confluence. Apply Life Sciences/Pharma industry context to ensure data standards and compliance. Basic Qualifications and Experience: Masters degree with 1 - 3 years of experience in Business, Engineering, IT or related field OR Bachelors degree with 2 - 5 years of experience in Business, Engineering, IT or related field OR Diploma with 6 - 8 years of experience in Business, Engineering, IT or related field Functional Skills: Must-Have Skills: Advanced SQL expertise and data wrangling. Strong experience in Python and PySpark for data transformation workflows. Strong experience with Databricks and AWS architecture. Must have knowledge of MDM, data governance, stewardship, and profiling practices. In addition to above, candidates having experience with Informatica or Reltio MDM platforms will be preferred. Good-to-Have Skills: Experience with IDQ, data modeling and approval workflow/DCR. Background in Life Sciences/Pharma industries. Familiarity with project tools like JIRA and Confluence. Strong grip on data engineering concepts. Professional Certifications : Any ETL certification ( e.g. Informatica) Any Data Analysis certification (SQL, Python, Databricks) Any cloud certification (AWS or AZURE) Soft Skills: Strong analytical abilities to assess and improve master data processes and solutions. Excellent verbal and written communication skills, with the ability to convey complex data concepts clearly to technical and non-technical stakeholders. Effective problem-solving skills to address data-related issues and implement scalable solutions. Ability to work effectively with global, virtual teams

Posted 6 days ago

Apply

7.0 years

0 Lacs

Delhi

On-site

GlassDoor logo

Description Shape the Future of Work with Eptura At Eptura, we're not just another tech company—we're a global leader transforming the way people, workplaces, and assets connect. Our innovative worktech solutions empower 25 million users across 115 countries to thrive in a digitally connected world. Trusted by 45% of Fortune 500 companies, we're redefining workplace innovation and driving success for organizations around the globe. Job Description We are seeking a Technical Lead – Data Engineering to spearhead the design, development, and optimization of complex data pipelines and ETL processes . This role requires deep expertise in data modeling, cloud platforms, and automation to ensure high-quality, scalable solutions . You will collaborate closely with stakeholders, engineers, and business teams to drive data-driven decision-making across our organization. Responsibilities Work with stakeholders to understand data requirements and architect end-to-end ETL solutions . Design and maintain data models , including schema design and optimization . Develop and automate data pipelines to ensure quality, consistency, and efficiency . Lead the architecture and delivery of key modules within data platforms . Build and refine complex data models in Power BI , simplifying data structures with dimensions and hierarchies . Write clean, scalable code using Python, Scala, and PySpark (must-have skills). Test, deploy, and continuously optimize applications and systems . Mentor team members and participate in engineering hackathons to drive innovation. About You 7+ years of experience in Data Engineering , with at least 2 years in a leadership role . Strong expertise in Python, PySpark, and SQL for data processing and transformation . Hands-on experience with Azure cloud computing , including Azure Data Factory and Databricks . Proficiency in Analytics/Visualization tools : Power BI, Looker, Tableau, IBM Cognos. Strong understanding of data modeling , including dimensions and hierarchy structures . Experience working with Agile methodologies and DevOps practices (GitLab, GitHub). Excellent communication and problem-solving skills in cross-functional environments. Ability to reduce added cost, complexity, and security risks with scalable analytics solutions. Nice to have: Experience working with NoSQL databases (Cosmos DB, MongoDB) . Familiarity with AutoCAD and building systems for advanced data visualization . Knowledge of identity and security protocols , such as SAML, SCIM, and FedRAMP compliance . Benefits Health insurance fully paid–Spouse, children, and Parents Accident insurance fully paid Flexible working allowance 25 days holidays 7 paid sick days 10 public holidays Employee Assistance Program Eptura Information Follow us on Twitter | LinkedIn | Facebook | YouTube Eptura is an Equal Opportunity Employer. At Eptura we promote our flexible workspace environment, free from discrimination. We believe that diversity of experience, perspective, and background leads to a better environment for all our people and a better product for our customers. Everyone is welcome at Eptura, no matter where you are from, and the more diverse we are, the more unified we will be in ensuring respectful connections all around the world. #LI-TS1 #LI-Hybrid About Eptura Ready to make a difference? Explore opportunities with Eptura and join us on this incredible journey. Joining Eptura means becoming part of a forward-thinking, dynamic team that's on a mission to shape a better, more connected future. We're seeking passionate, driven individuals who want to make a real impact and be at the forefront of workplace innovation. At Eptura, diversity and inclusion are at the heart of what we do. We believe that embracing unique perspectives and backgrounds leads to stronger teams and better solutions for our customers. We are committed to creating a flexible, inclusive environment where everyone is welcome and empowered to succeed.

Posted 6 days ago

Apply

12.0 - 22.0 years

8 - 18 Lacs

Pune, Bengaluru

Hybrid

Naukri logo

Role & responsibilities Understanding of the business area that the project is involved with. Working with data stewards to understand the data sources. Clear understanding of data entities, relationships, cardinality etc for the inbound sources based on inputs from the data stewards / source system experts. Performance tuning understanding the overall requirement, reporting impact. Data Modeling for the business and reporting models as per the reporting needs or delivery needs to other downstream systems. Have experience to components and languages like Databricks, Python, PySpark, SCALA, R. Ability to ask strong questions to help the team see areas that may lead to problems. Ability to validate the data by writing sql queries and compare against the source system and transformation mapping. Work closely with teams to collect and translate information requirements into data to develop data-centric solutions. Ensure that industry-accepted data architecture principles and standards are integrated and followed for modeling, stored procedures, replication, regulations, and security, among other concepts, to meet technical and business goals. Continuously improve the quality, consistency, accessibility, and security of our data activity across company needs. Experience on Azure DevOps project tracking tool or equivalent tools like JIRA. Should have Outstanding verbal, non-verbal communication. Should have experience and desire to work in a Global delivery environment.

Posted 6 days ago

Apply

2.0 years

0 Lacs

Raipur

On-site

GlassDoor logo

Company Name- Interbiz Consulting Pvt Ltd Position/Designation- Data Engineer Job Location- Raipur (C.G.) Mode- Work from office Experience- 2 to 5 Years We are seeking a talented and detail-oriented Data Engineer to join our growing Data & Analytics team. You will be responsible for building and maintaining robust, scalable data pipelines and infrastructure to support data-driven decision-making across the organization. Key Responsibilities Design and implement ETL/ELT data pipelines for structured and unstructured data using Azure Data Factory , Databricks , or Apache Spark . Work with Azure Blob Storage , Data Lake , and Synapse Analytics to build scalable data lakes and warehouses. Develop real-time data ingestion pipelines using Apache Kafka , Apache Flink , or Apache Beam . Build and schedule jobs using orchestration tools like Apache Airflow or Dagster . Perform data modeling using Kimball methodology for building dimensional models in Snowflake or other data warehouses. Implement data versioning and transformation using DBT and Apache Iceberg or Delta Lake . Manage data cataloging and lineage using tools like Marquez or Collibra . Collaborate with DevOps teams to containerize solutions using Docker , manage infrastructure with Terraform , and deploy on Kubernetes . Setup and maintain monitoring and alerting systems using Prometheus and Grafana for performance and reliability. Required Skills and Qualifications Qualifications Bachelor’s or Master’s degree in Computer Science, Information Systems, or a related field. [1–5+] years of experience in data engineering or related roles. Proficiency in Python , with strong knowledge of OOP and data structures & algorithms . Comfortable working in Linux environments for development and deployment. Strong command over SQL and understanding of relational (DBMS) and NoSQL databases. Solid experience with Apache Spark (PySpark/Scala). Familiarity with real-time processing tools like Kafka , Flink , or Beam . Hands-on experience with Airflow , Dagster , or similar orchestration tools. Deep experience with Microsoft Azure , especially Azure Data Factory , Blob Storage , Synapse , Azure Functions , etc. AZ-900 or other Azure certifications are a plus. Knowledge of dimensional modeling , Snowflake , Apache Iceberg , and Delta Lake . Understanding of modern Lakehouse architecture and related best practices. Familiarity with Marquez , Collibra , or other cataloging tools. Experience with Terraform , Docker , Kubernetes , and Jenkins or equivalent CI/CD tools. Proficiency in setting up dashboards and alerts with Prometheus and Grafana . Interested candidates may share their CV on swapna.rani@interbizconsulting.com or visit www.interbizconsulting.com Note:- Immediate joiner will be preferred. Job Type: Full-time Pay: From ₹25,000.00 per month Benefits: Food provided Health insurance Leave encashment Provident Fund Supplemental Pay: Yearly bonus Application Question(s): Do you have at least 2 years of work experience in Python? Do you have at least 2 years of work experience in Data Science? Are you from Raipur, Chhattisgarh? Are you willing to work for more than 2 years? What is your notice period? What is your current salary and what you are expecting? Work Location: In person

Posted 6 days ago

Apply

5.0 - 10.0 years

15 - 25 Lacs

Hyderabad, Pune, Bengaluru

Hybrid

Naukri logo

We are looking for a skilled Data Engineer with expertise in Azure Data Factory, ADLS, SQL, PySpark, Python and Azure Databricks to design, build and optimize data pipelines. Also ensuring efficient data ingestion, transformation and storage solutions. Key Responsibilities: Desing and Develop data pipelines using Azure Data Factory and SSIS Manage Cloud Data Storage and Processing (AWS S3, ADLS etc) Write complex SQL queries, optimize performance Process large datasets using PySpark Develop scripts for data processing, automation and API integration using Python Develop Databrikcs notebook, manage workflows and implement delta lake for data transactions Pipeline orchestration and Monitoring Knowledge on CI/CD using Azure DevOps /Github

Posted 6 days ago

Apply

5.0 years

0 Lacs

Trivandrum, Kerala, India

On-site

Linkedin logo

5+ years of experience as a Data Analyst or similar role. Proven track record of collecting, cleaning, analyzing, and interpreting large datasets Expertise in Pipeline designing and Validation Expertise in statistical methods, machine learning techniques, and data mining techniques Proficiency in SQL, Python, PySpark, Looker, Prometheus, Carbon, Clickhouse, Kafka, HDFS and ELK stack (Elasticsearch, Logstash, and Kibana) Experience with data visualization tools such as Grafana and Looker Ability to work independently and as part of a team Problem-solving and analytical skills to extract meaningful insights from data Strong business acumen to understand the implications of data findings Collect, clean, and organize large datasets from various sources Perform data analysis using statistical methods, machine learning techniques, and data visualization tools Identify patterns, trends, and anomalies within datasets to uncover insights Develop and maintain data models to represent the organization's business operations Create interactive dashboards and reports to communicate data findings to stakeholders Document data analysis procedures and findings to ensure knowledge transfer High analytical skills A high degree of initiative and flexibility High customer orientation High quality awareness Excellent verbal and written communication skills Logical thinking and problem solving skills along with an ability to collaborate Two or three industry domain knowledge Understanding of the financial processes for various types of projects and the various pricing models available Client Interfacing skills Knowledge of SDLC and agile methodologies Project and Team management Show more Show less

Posted 6 days ago

Apply

15.0 years

0 Lacs

Chennai

On-site

GlassDoor logo

Project Role : Data Engineer Project Role Description : Design, develop and maintain data solutions for data generation, collection, and processing. Create data pipelines, ensure data quality, and implement ETL (extract, transform and load) processes to migrate and deploy data across systems. Must have skills : PySpark Good to have skills : NA Minimum 5 year(s) of experience is required Educational Qualification : 15 years full time education Summary: As a Data Engineer, you will design, develop, and maintain data solutions that facilitate data generation, collection, and processing. Your typical day will involve creating data pipelines, ensuring data quality, and implementing ETL processes to migrate and deploy data across various systems. You will collaborate with cross-functional teams to understand their data needs and provide effective solutions, ensuring that the data infrastructure is robust and scalable to meet the demands of the organization. Roles & Responsibilities: - Expected to be an SME. - Collaborate and manage the team to perform. - Responsible for team decisions. - Engage with multiple teams and contribute on key decisions. - Provide solutions to problems for their immediate team and across multiple teams. - Mentor junior team members to enhance their skills and knowledge in data engineering. - Continuously evaluate and improve data processes to enhance efficiency and effectiveness. Professional & Technical Skills: - Must To Have Skills: Proficiency in PySpark. - Good To Have Skills: Experience with Apache Kafka. - Strong understanding of data warehousing concepts and architecture. - Familiarity with cloud platforms such as AWS or Azure. - Experience in SQL and NoSQL databases for data storage and retrieval. Additional Information: - The candidate should have minimum 5 years of experience in PySpark. - This position is based in Chennai. - A 15 years full time education is required. 15 years full time education

Posted 6 days ago

Apply

3.0 - 5.0 years

4 - 7 Lacs

Pune

Work from Office

Naukri logo

Job Title - S&C Global Network - AI - CDP - Marketing Analytics - Analyst Management Level: 11-Analyst Location: Bengaluru, BDC7C Must-have skills: Data Analytics Good to have skills: Ability to leverage design thinking, business process optimization, and stakeholder management skills. Job Summary : This role involves driving strategic initiatives, managing business transformations, and leveraging industry expertise to create value-driven solutions. Roles & Responsibilities: Provide strategic advisory services, conduct market research, and develop data-driven recommendations to enhance business performance. WHATS IN IT FOR YOU As part of our Analytics practice, you will join a worldwide network of over 20k+ smart and driven colleagues experienced in leading AI/ML/Statistical tools, methods and applications. From data to analytics and insights to actions, our forward-thinking consultants provide analytically-informed, issue-based insights at scale to help our clients improve outcomes and achieve high performance. What you would do in this role A Consultant/Manager for Customer Data Platforms serves as the day-to-day marketing technology point of contact and helps our clients get value out of their investment into a Customer Data Platform (CDP) by developing a strategic roadmap focused on personalized activation. You will be working with a multidisciplinary team of Solution Architects, Data Engineers, Data Scientists, and Digital Marketers. Key Duties and Responsibilities: Be a platform expert in one or more leading CDP solutions. Developer level expertise on Lytics, Segment, Adobe Experience Platform, Amperity, Tealium, Treasure Data etc. Including custom build CDPs Deep developer level expertise for real time even tracking for web analytics e.g., Google Tag Manager, Adobe Launch etc. Provide deep domain expertise in our clients business and broad knowledge of digital marketing together with a Marketing Strategist industry Deep expert level knowledge of GA360/GA4, Adobe Analytics, Google Ads, DV360, Campaign Manager, Facebook Ads Manager, The Trading desk etc. Assess and audit the current state of a clients marketing technology stack (MarTech) including data infrastructure, ad platforms and data security policies together with a solutions architect. Conduct stakeholder interviews and gather business requirements Translate business requirements into BRDs, CDP customer analytics use cases, structure technical solution Prioritize CDP use cases together with the client. Create a strategic CDP roadmap focused on data driven marketing activation. Work with the Solution Architect to strategize, architect, and document a scalable CDP implementation, tailored to the clients needs. Provide hands-on support and platform training for our clients. Data processing, data engineer and data schema/models expertise for CDPs to work on data models, unification logic etc. Work with Business Analysts, Data Architects, Technical Architects, DBAs to achieve project objectives - delivery dates, quality objectives etc. Business intelligence expertise for insights, actionable recommendations. Project management expertise for sprint planning Professional & Technical Skills: - Relevant experience in the required domain.- Strong analytical, problem-solving, and communication skills.- Ability to work in a fast-paced, dynamic environment. Strong understanding of data governance and compliance (i.e. PII, PHI, GDPR, CCPA) Experience with analytics tools like Google Analytics or Adobe Analytics is a plus. Experience with A/B testing tools is a plus. Must have programming experience in PySpark, Python, Shell Scripts. RDBMS, TSQL, NoSQL experience is must. Manage large volumes of structured and unstructured data, extract & clean data to make it amenable for analysis. Experience in deployment and operationalizing the code is an added advantage. Experience with source control systems such as Git, Bitbucket, and Jenkins build and continuous integration tools. Proficient in Excel, MS word, PowerPoint, etc Technical Skills: Any CDP platforms experience e.g., Lytics CDP platform developer, or/and Segment CDP platform developer, or/and Adobe Experience Platform (Real time CDP) developer, or/and Custom CDP developer on any cloud GA4/GA360, or/and Adobe Analytics Google Tag Manager, and/or Adobe Launch, and/or any Tag Manager Tool Google Ads, DV360, Campaign Manager, Facebook Ads Manager, The Trading desk etc. Deep Cloud experiecne (GCP, AWS, Azure) Advance level Python, SQL, Shell Scripting experience Data Migration, DevOps, MLOps, Terraform Script programmer Soft Skills: Strong problem solving skills Good team player Attention to details Good communication skills Additional Information: - Opportunity to work on innovative projects. - Career growth and leadership exposure. Qualification Experience: 3-5Years Educational Qualification: Any Degree

Posted 6 days ago

Apply

3.0 years

0 Lacs

Bangalore Urban, Karnataka, India

On-site

Linkedin logo

About The Role We are seeking a highly skilled Data Engineer with deep expertise in PySpark and the Cloudera Data Platform (CDP) to join our data engineering team. As a Data Engineer, you will be responsible for designing, developing, and maintaining scalable data pipelines that ensure high data quality and availability across the organization. This role requires a strong background in big data ecosystems, cloud-native tools, and advanced data processing techniques. The ideal candidate has hands-on experience with data ingestion, transformation, and optimization on the Cloudera Data Platform, along with a proven track record of implementing data engineering best practices. You will work closely with other data engineers to build solutions that drive impactful business insights. Responsibilities Data Pipeline Development: Design, develop, and maintain highly scalable and optimized ETL pipelines using PySpark on the Cloudera Data Platform, ensuring data integrity and accuracy. Data Ingestion: Implement and manage data ingestion processes from a variety of sources (e.g., relational databases, APIs, file systems) to the data lake or data warehouse on CDP. Data Transformation and Processing: Use PySpark to process, cleanse, and transform large datasets into meaningful formats that support analytical needs and business requirements. Performance Optimization: Conduct performance tuning of PySpark code and Cloudera components, optimizing resource utilization and reducing runtime of ETL processes. Data Quality and Validation: Implement data quality checks, monitoring, and validation routines to ensure data accuracy and reliability throughout the pipeline. Automation and Orchestration: Automate data workflows using tools like Apache Oozie, Airflow, or similar orchestration tools within the Cloudera ecosystem. Education and Experience Bachelors or Masters degree in Computer Science, Data Engineering, Information Systems, or a related field. 3+ years of experience as a Data Engineer, with a strong focus on PySpark and the Cloudera Data Platform. Technical Skills PySpark: Advanced proficiency in PySpark, including working with RDDs, DataFrames, and optimization techniques. Cloudera Data Platform: Strong experience with Cloudera Data Platform (CDP) components, including Cloudera Manager, Hive, Impala, HDFS, and HBase. Data Warehousing: Knowledge of data warehousing concepts, ETL best practices, and experience with SQL-based tools (e.g., Hive, Impala). Big Data Technologies: Familiarity with Hadoop, Kafka, and other distributed computing tools. Orchestration and Scheduling: Experience with Apache Oozie, Airflow, or similar orchestration frameworks. Scripting and Automation: Strong scripting skills in Linux. Show more Show less

Posted 6 days ago

Apply

3.0 years

0 Lacs

Bangalore Urban, Karnataka, India

On-site

Linkedin logo

Role: Data Engineer Key Skill: Pyspark, Cloudera Data Platfrorm, Big data - Hadoop, Hive, Kafka Responsibilities Data Pipeline Development: Design, develop, and maintain highly scalable and optimized ETL pipelines using PySpark on the Cloudera Data Platform, ensuring data integrity and accuracy. Data Ingestion: Implement and manage data ingestion processes from a variety of sources (e.g., relational databases, APIs, file systems) to the data lake or data warehouse on CDP. Data Transformation and Processing: Use PySpark to process, cleanse, and transform large datasets into meaningful formats that support analytical needs and business requirements. Performance Optimization: Conduct performance tuning of PySpark code and Cloudera components, optimizing resource utilization and reducing runtime of ETL processes. Data Quality and Validation: Implement data quality checks, monitoring, and validation routines to ensure data accuracy and reliability throughout the pipeline. Automation and Orchestration: Automate data workflows using tools like Apache Oozie, Airflow, or similar orchestration tools within the Cloudera ecosystem. Technical Skills 3+ years of experience as a Data Engineer, with a strong focus on PySpark and the Cloudera Data Platform PySpark: Advanced proficiency in PySpark, including working with RDDs, DataFrames, and optimization techniques. Cloudera Data Platform: Strong experience with Cloudera Data Platform (CDP) components, including Cloudera Manager, Hive, Impala, HDFS, and HBase. Data Warehousing: Knowledge of data warehousing concepts, ETL best practices, and experience with SQL-based tools (e.g., Hive, Impala). Big Data Technologies: Familiarity with Hadoop, Kafka, and other distributed computing tools. Orchestration and Scheduling: Experience with Apache Oozie, Airflow, or similar orchestration frameworks. Scripting and Automation: Strong scripting skills in Linux. Show more Show less

Posted 6 days ago

Apply

0 years

7 - 9 Lacs

Noida

On-site

GlassDoor logo

Line of Service Advisory Industry/Sector Not Applicable Specialism Data, Analytics & AI Management Level Senior Associate Job Description & Summary At PwC, our people in data and analytics engineering focus on leveraging advanced technologies and techniques to design and develop robust data solutions for clients. They play a crucial role in transforming raw data into actionable insights, enabling informed decision-making and driving business growth. In data engineering at PwC, you will focus on designing and building data infrastructure and systems to enable efficient data processing and analysis. You will be responsible for developing and implementing data pipelines, data integration, and data transformation solutions. *Why PWC At PwC, you will be part of a vibrant community of solvers that leads with trust and creates distinctive outcomes for our clients and communities. This purpose-led and values-driven work, powered by technology in an environment that drives innovation, will enable you to make a tangible impact in the real world. We reward your contributions, support your wellbeing, and offer inclusive benefits, flexibility programmes and mentorship that will help you thrive in work and life. Together, we grow, learn, care, collaborate, and create a future of infinite experiences for each other. Learn more about us . At PwC, we believe in providing equal employment opportunities, without any discrimination on the grounds of gender, ethnic background, age, disability, marital status, sexual orientation, pregnancy, gender identity or expression, religion or other beliefs, perceived differences and status protected by law. We strive to create an environment where each one of our people can bring their true selves and contribute to their personal growth and the firm’s growth. To enable this, we have zero tolerance for any discrimination and harassment based on the above considerations. " Job Description & Summary: A career within…. A career within Data and Analytics services will provide you with the opportunity to help organisations uncover enterprise insights and drive business results using smarter data analytics. We focus on a collection of organisational technology capabilities, including business intelligence, data management, and data assurance that help our clients drive innovation, growth, and change within their organisations in order to keep up with the changing nature of customers and technology. We make impactful decisions by mixing mind and machine to leverage data, understand and navigate risk, and help our clients gain a competitive edge. Responsibilities: Design, develop, and optimize data pipelines and ETL processes using PySpark or Scala to extract, transform, and load large volumes of structured and unstructured data from diverse sources. Implement data ingestion, processing, and storage solutions on Azure cloud platform, leveraging services such as Azure Databricks, Azure Data Lake Storage, and Azure Synapse Analytics. Develop and maintain data models, schemas, and metadata to support efficient data access, query performance, and analytics requirements. Monitor pipeline performance, troubleshoot issues, and optimize data processing workflows for scalability, reliability, and cost-effectiveness. Implement data security and compliance measures to protect sensitive information and ensure regulatory compliance. Requirement Proven experience as a Data Engineer, with expertise in building and optimizing data pipelines using PySpark, Scala, and Apache Spark. Hands-on experience with cloud platforms, particularly Azure, and proficiency in Azure services such as Azure Databricks, Azure Data Lake Storage, Azure Synapse Analytics, and Azure SQL Database. Strong programming skills in Python and Scala, with experience in software development, version control, and CI/CD practices. Familiarity with data warehousing concepts, dimensional modeling, and relational databases (e.g., SQL Server, PostgreSQL, MySQL). Experience with big data technologies and frameworks (e.g., Hadoop, Hive, HBase) is a plus. Mandatory skill sets: Spark, Pyspark, Azure Preferred skill sets: Spark, Pyspark, Azure Years of experience required: 4 - 8 Education qualification: B.Tech / M.Tech / MBA / MCA Education (if blank, degree and/or field of study not specified) Degrees/Field of Study required: Bachelor of Technology, Master of Business Administration Degrees/Field of Study preferred: Certifications (if blank, certifications not specified) Required Skills Microsoft Azure Optional Skills Accepting Feedback, Accepting Feedback, Active Listening, Agile Scalability, Amazon Web Services (AWS), Analytical Thinking, Apache Airflow, Apache Hadoop, Azure Data Factory, Communication, Creativity, Data Anonymization, Data Architecture, Database Administration, Database Management System (DBMS), Database Optimization, Database Security Best Practices, Databricks Unified Data Analytics Platform, Data Engineering, Data Engineering Platforms, Data Infrastructure, Data Integration, Data Lake, Data Modeling, Data Pipeline {+ 27 more} Desired Languages (If blank, desired languages not specified) Travel Requirements Not Specified Available for Work Visa Sponsorship? No Government Clearance Required? No Job Posting End Date

Posted 6 days ago

Apply

0 years

0 Lacs

Noida

On-site

GlassDoor logo

Line of Service Advisory Industry/Sector Not Applicable Specialism Data, Analytics & AI Management Level Manager Job Description & Summary At PwC, our people in data and analytics engineering focus on leveraging advanced technologies and techniques to design and develop robust data solutions for clients. They play a crucial role in transforming raw data into actionable insights, enabling informed decision-making and driving business growth. In data engineering at PwC, you will focus on designing and building data infrastructure and systems to enable efficient data processing and analysis. You will be responsible for developing and implementing data pipelines, data integration, and data transformation solutions. *Why PWC At PwC, you will be part of a vibrant community of solvers that leads with trust and creates distinctive outcomes for our clients and communities. This purpose-led and values-driven work, powered by technology in an environment that drives innovation, will enable you to make a tangible impact in the real world. We reward your contributions, support your wellbeing, and offer inclusive benefits, flexibility programmes and mentorship that will help you thrive in work and life. Together, we grow, learn, care, collaborate, and create a future of infinite experiences for each other. Learn more about us . At PwC, we believe in providing equal employment opportunities, without any discrimination on the grounds of gender, ethnic background, age, disability, marital status, sexual orientation, pregnancy, gender identity or expression, religion or other beliefs, perceived differences and status protected by law. We strive to create an environment where each one of our people can bring their true selves and contribute to their personal growth and the firm’s growth. To enable this, we have zero tolerance for any discrimination and harassment based on the above considerations. Job Description & Summary: A career within Data and Analytics services will provide you with the opportunity to help organisations uncover enterprise insights and drive business results using smarter data analytics. We focus on a collection of organisational technology capabilities, including business intelligence, data management, and data assurance that help our clients drive innovation, growth, and change within their organisations in order to keep up with the changing nature of customers and technology. We make impactful decisions by mixing mind and machine to leverage data, understand and navigate risk, and help our clients gain a competitive edge. Responsibilities : Design, develop, and optimize data pipelines and ETL processes using PySpark or Scala to extract, transform, and load large volumes of structured and unstructured data from diverse sources. Implement data ingestion, processing, and storage solutions on Azure cloud platform, leveraging services such as Azure Databricks, Azure Data Lake Storage, and Azure Synapse Analytics. Develop and maintain data models, schemas, and metadata to support efficient data access, query performance, and analytics requirements. Monitor pipeline performance, troubleshoot issues, and optimize data processing workflows for scalability, reliability, and cost-effectiveness. Implement data security and compliance measures to protect sensitive information and ensure regulatory compliance. Requirement Proven experience as a Data Engineer, with expertise in building and optimizing data pipelines using PySpark , Scala, and Apache Spark. Hands-on experience with cloud platforms, particularly Azure, and proficiency in Azure services such as Azure Databricks, Azure Data Lake Storage, Azure Synapse Analytics, and Azure SQL Database. Strong programming skills in Python and Scala, with experience in software development, version control, and CI/CD practices. Familiarity with data warehousing concepts, dimensional modeling, and relational databases (e.g., SQL Server, PostgreSQL, MySQL). Experience with big data technologies and frameworks (e.g., Hadoop, Hive, HBase) is a plus. Mandatory skill set s: Spark, Pyspark , Azure Preferred skill sets : Spark, Pyspark , Azure Years of experience required : 8 - 12 Education qualification : B.Tech / M.Tech / MBA / MCA Education (if blank, degree and/or field of study not specified) Degrees/Field of Study required: Bachelor of Engineering, Master of Engineering, Master of Business Administration Degrees/Field of Study preferred: Certifications (if blank, certifications not specified) Required Skills Data Science Optional Skills Accepting Feedback, Accepting Feedback, Active Listening, Agile Scalability, Amazon Web Services (AWS), Analytical Thinking, Apache Airflow, Apache Hadoop, Azure Data Factory, Coaching and Feedback, Communication, Creativity, Data Anonymization, Data Architecture, Database Administration, Database Management System (DBMS), Database Optimization, Database Security Best Practices, Databricks Unified Data Analytics Platform, Data Engineering, Data Engineering Platforms, Data Infrastructure, Data Integration, Data Lake, Data Modeling {+ 32 more} Desired Languages (If blank, desired languages not specified) Travel Requirements Not Specified Available for Work Visa Sponsorship? No Government Clearance Required? No Job Posting End Date

Posted 6 days ago

Apply

5.0 - 10.0 years

4 - 8 Lacs

Pune

Work from Office

Naukri logo

We are organizing a direct walk-in drive at Pune location. Please find below details and skills for which we have a walk-in at TCS - Pune on 21st June 2025 Experience: 5- 10 years Skill Name :- (1) Dot Net (2) AWS data, Pyspark, Redshift (3) AWS Node JS (4) Azure Devops with Terraform (5) Java Springboot Microservice (6) Mainframe, CICS, COBOL, DB2

Posted 6 days ago

Apply

4.0 - 9.0 years

20 - 30 Lacs

Pune, Bengaluru

Hybrid

Naukri logo

Job role & responsibilities:- Understanding operational needs by collaborating with specialized teams Supporting key business operations. This involves in supporting architecture designing and improvements ,Understanding Data integrity and Building Data Models, Designing and implementing agile, scalable, and cost efficiency solution. Lead a team of developers, implement Sprint planning and executions to ensure timely deliveries Technical Skills, Qualification and experience required:- Proficient in Data Modelling 4-10 years of experience in Data Modelling. Exp in Data modeling tools ( Tool - Erwin). building ER diagram Hands on experince into ERwin / Visio tool Hands-on Expertise in Entity Relationship, Dimensional and NOSQL Modelling Familiarity with manipulating dataset using using Python. Exposure of Azure Cloud Services (Azure DataFactory, Azure Devops and Databricks) Exposure to UML Tool like Ervin/Visio Familiarity with tools such as Azure DevOps, Jira and GitHub Analytical approaches using IE or other common Notations Strong hands-on experience on SQL Scripting Bachelors/Master's Degree in Computer Science or related field Experience leading agile scrum, sprint planning and review sessions Good communication and interpersonal skills Good communication skills to coordinate between business stakeholders & engineers Strong results-orientation and time management True team player who is comfortable working in a global team Ability to establish relationships with stakeholders quickly in order to collaborate on use cases Autonomy, curiosity and innovation capability Comfortable working in a multidisciplinary team within a fast-paced environment * Immediate Joiners will be preferred only

Posted 6 days ago

Apply

0 years

0 Lacs

Hyderabad, Telangana, India

Remote

Linkedin logo

When you join Verizon You want more out of a career. A place to share your ideas freely — even if they’re daring or different. Where the true you can learn, grow, and thrive. At Verizon, we power and empower how people live, work and play by connecting them to what brings them joy. We do what we love — driving innovation, creativity, and impact in the world. Our V Team is a community of people who anticipate, lead, and believe that listening is where learning begins. In crisis and in celebration, we come together — lifting our communities and building trust in how we show up, everywhere & always. Want in? Join the #VTeamLife. What You’ll Be Doing... Designing and Implementing ML Model pipelines (Batch and real-time) for efficient model training and serving/inference. Implementing and analyzing the performance of advanced algorithms (Specifically Deep Learning based ML Models). Solving the model inferencing failures/fallouts. Optimizing existing machine-learning Model Pipelines to ensure the training/inferencing is within the standard duration Collaborating effectively with cross-functional teams to understand business needs and deliver impactful solutions. Contributing to developing robust and scalable distributed computing systems for large-scale data processing. Designing, developing, and implementing innovative AI/ML solutions using Python, CI/CD, public cloud platforms Implementing model performance metrics pipeline for predictive models, covering different types of algorithms to adhere to Responsible AI. What We’re Looking For... You’ll need to have: Bachelor's degree or four or more years of work experience. Four or more years of relevant work experience. Experience in Batch Model Inferencing, Model-serving in Realtime. Knowledge on Frameworks such as BentoML TensorFlow Serving (TFX) or Triton. Solid expertise on GCP Cloud ML techstacks such as Bigquery, Data Proc, Airflow, Cloud Functions, Spanner, Data Flow. Very good experience on languages such as Python and PySpark. Expertise on Distributed computation and Multi-node distributed model training. Good understanding on GPU usage management. Experience on RAY Core and RAY Serve (batch and real-time models). Experience in CI/CD practices. Even better if you have one or more of the following: GCP Certifications or any Cloud Certification on AI/ML or Data w If Verizon and this role sound like a fit for you, we encourage you to apply even if you don’t meet every “even better” qualification listed above. #AI&D Where you’ll be working In this hybrid role, you'll have a defined work location that includes work from home and assigned office days set by your manager. Scheduled Weekly Hours 40 Equal Employment Opportunity Verizon is an equal opportunity employer. We evaluate qualified applicants without regard to race, gender, disability or any other legally protected characteristics. Show more Show less

Posted 6 days ago

Apply

3.0 - 5.0 years

15 - 17 Lacs

Pune

Work from Office

Naukri logo

Performance Testing Specialist Databricks Pipelines Key Responsibilities: - Design and execute performance testing strategies specifically for Databricks-based data pipelines. - Identify performance bottlenecks and provide optimization recommendations across Spark/Databricks workloads. - Collaborate with development and DevOps teams to integrate performance testing into CI/CD pipelines. - Analyze job execution metrics, cluster utilization, memory/storage usage, and latency across various stages of data pipeline processing. - Create and maintain performance test scripts, frameworks, and dashboards using tools like JMeter, Locust, or custom Python utilities. - Generate detailed performance reports and suggest tuning at the code, configuration, and platform levels. - Conduct root cause analysis for slow-running ETL/ELT jobs and recommend remediation steps. - Participate in production issue resolution related to performance and contribute to RCA documentation. Technical Skills: Mandatory - Strong understanding of Databricks, Apache Spark, and performance tuning techniques for distributed data processing systems. - Hands-on experience in Spark (PySpark/Scala) performance profiling, partitioning strategies, and job parallelization. - 2+ years of experience in performance testing and load simulation of data pipelines. - Solid skills in SQL, Snowflake, and analyzing performance via query plans and optimization hints. - Familiarity with Azure Databricks, Azure Monitor, Log Analytics, or similar observability tools. - Proficient in scripting (Python/Shell) for test automation and pipeline instrumentation. - Experience with DevOps tools such as Azure DevOps, GitHub Actions, or Jenkins for automated testing. - Comfortable working in Unix/Linux environments and writing shell scripts for monitoring and debugging. Good to Have - Experience with job schedulers like Control-M, Autosys, or Azure Data Factory trigger flows. - Exposure to CI/CD integration for automated performance validation. - Understanding of network/storage I/O tuning parameters in cloud-based environments.

Posted 6 days ago

Apply

10.0 - 16.0 years

15 - 30 Lacs

Bengaluru, Mumbai (All Areas)

Work from Office

Naukri logo

Gracenote, a Nielsen company, is dedicated to connecting audiences to the entertainment they love, powering a better media future for all people. Gracenote is the content data business unit of Nielsen that powers innovative entertainment experiences for the worlds leading media companies. Our entertainment metadata and connected IDs deliver advanced content navigation and discovery to connect consumers to the content they love and discover new ones. Gracenotes industry-leading datasets cover TV programs, movies, sports, music and podcasts in 80 countries and 35 languages.Common identifiers Universally adopted by the worlds leading media companies to deliver powerful cross-media entertainment experiences. Machine driven, human validated best-in-class data and images fuel new search and discovery experiences across every screen. Gracenote's Data Organization is a dynamic and innovative group that is essential in delivering business outcomes through data, insights, predictive & prescriptive analytics. An extremely motivated team that values creativity, experimentation through continuous learning in an agile and collaborative manner. From designing, developing and maintaining data architecture that satisfies our business goals to managing data governance and region-specific regulations, the data team oversees the whole data lifecycle. Role Overview: We are seeking an experienced Senior Data Engineer with 10-12 years of experience to join our Video engineering team with Gracenote - a NielsenIQ Company. In this role, you will design, build, and maintain our data processing systems and pipelines. You will work closely with Product managers, Architects, analysts, and other stakeholders to ensure data is accessible, reliable, and optimized for Business, analytical and operational needs. Key Responsibilities: Design, develop, and maintain scalable data pipelines and ETL processes Architect and implement data warehousing solutions and data lakes Optimize data flow and collection for cross-functional teams Build infrastructure required for optimal extraction, transformation, and loading of data Ensure data quality, reliability, and integrity across all data systems Collaborate with data scientists and analysts to help implement models and algorithms Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, etc. Create and maintain comprehensive technical documentation Mentor junior engineers and provide technical leadership Evaluate and integrate new data management technologies and tools Implement Optimization strategies to enable and maintain sub second latency. Oversee Data infrastructure to ensure robust deployment and monitoring of the pipelines and processes. Stay ahead of emerging trends in Data, cloud, integrating new research into practical applications. Mentor and grow a team of junior data engineers. Required qualification and Skills: Expert-level proficiency in Python, SQL, and big data tools (Spark, Kafka, Airflow). Bachelor's degree in Computer Science, Engineering, or related field; Master's degree preferred Expert knowledge of SQL and experience with relational databases (e.g., PostgreSQL, Redshift, TIDB, MySQL, Oracle, Teradata) Extensive experience with big data technologies (e.g., Hadoop, Spark, Hive, Flink) Proficiency in at least one programming language such as Python, Java, or Scala Experience with data modeling, data warehousing, and building ETL pipelines Strong knowledge of data pipeline and workflow management tools (e.g., Airflow, Luigi, NiFi) Experience with cloud platforms (AWS, Azure, or GCP) and their data services. AWS Preferred Hands on Experience with building streaming pipelines with flink, Kafka, Kinesis. Flink Preferred. Understanding of data governance and data security principles Experience with version control systems (e.g., Git) and CI/CD practices Proven leadership skills in grooming data engineering teams. Preferred Skills Experience with containerization and orchestration tools (Docker, Kubernetes) Basic knowledge of machine learning workflows and MLOps Experience with NoSQL databases (MongoDB, Cassandra, etc.) Familiarity with data visualization tools (Tableau, Power BI, etc.) Experience with real-time data processing Knowledge of data governance frameworks and compliance requirements (GDPR, CCPA, etc.) Experience with infrastructure-as-code tools (Terraform, CloudFormation)Role & responsibilities

Posted 6 days ago

Apply

6.0 - 10.0 years

15 - 25 Lacs

Bengaluru

Work from Office

Naukri logo

Who We Are At Kyndryl, we design, build, manage and modernize the mission-critical technology systems that the world depends on every day. So why work at Kyndryl? We are always moving forward – always pushing ourselves to go further in our efforts to build a more equitable, inclusive world for our employees, our customers and our communities. The Role At Kyndryl, we design, build, manage and modernize the mission-critical technology systems that the world depends on every day. So why work at Kyndryl? We are always moving forward – always pushing ourselves to go further in our efforts to build a more equitable, inclusive world for our employees, our customers and our communities. As a Data Engineer , you will leverage your expertise in Databricks , big data platforms , and modern data engineering practices to develop scalable data solutions for our clients. Candidates with healthcare experience, particularly with EPIC systems , are strongly encouraged to apply. This includes creating data pipelines, integrating data from various sources, and implementing data security and privacy measures. The Data Engineer will also be responsible for monitoring and troubleshooting data flows and optimizing data storage and processing for performance and cost efficiency. Responsibilities: Develop data ingestion, data processing and analytical pipelines for big data, relational databases and data warehouse solutions Design and implement data pipelines and ETL/ELT processes using Databricks, Apache Spark, and related tools. Collaborate with business stakeholders, analysts, and data scientists to deliver accessible, high-quality data solutions. Provide guidance on cloud migration strategies and data architecture patterns such as Lakehouse and Data Mesh Provide pros/cons, and migration considerations for private and public cloud architectures Provide technical expertise in troubleshooting, debugging, and resolving complex data and system issues. Create and maintain technical documentation, including system diagrams, deployment procedures, and troubleshooting guides Experience working with Data Governance, Data security and Data Privacy (Unity Catalogue or Purview) If you're ready to embrace the power of data to transform our business and embark on an epic data adventure, then join us at Kyndryl. Together, let's redefine what's possible and unleash your potential. Your Future at Kyndryl Every position at Kyndryl offers a way forward to grow your career. We have opportunities that you won’t find anywhere else, including hands-on experience, learning opportunities, and the chance to certify in all four major platforms. Whether you want to broaden your knowledge base or narrow your scope and specialize in a specific sector, you can find your opportunity here. Who You Are You’re good at what you do and possess the required experience to prove it. However, equally as important – you have a growth mindset; keen to drive your own personal and professional development. You are customer-focused – someone who prioritizes customer success in their work. And finally, you’re open and borderless – naturally inclusive in how you work with others. Required Technical and Professional Experience: 3+ years of consulting or client service delivery experience on Azure Graduate/Postgraduate in computer science, computer engineering, or equivalent with minimum of 8 years of experience in the IT industry. 3+ years of experience in developing data ingestion, data processing and analytical pipelines for big data, relational databases such as SQL server and data warehouse solutions such as Azure Synapse Extensive hands-on experience implementing data ingestion, ETL and data processing. Hands-on experience in and Big Data technologies such as Java, Python, SQL, ADLS/Blob, PySpark and Spark SQL, Databricks, HD Insight and live streaming technologies such as EventHub. Experience with cloud-based database technologies (Azure PAAS DB, AWS RDS and NoSQL). Cloud migration methodologies and processes including tools like Azure Data Factory, Data Migration Service, etc. Experience with monitoring and diagnostic tools (SQL Profiler, Extended Events, etc). Expertise in data mining, data storage and Extract-Transform-Load (ETL) processes. Experience with relational databases and expertise in writing and optimizing T-SQL queries and stored procedures. Experience in using Big Data File Formats and compression techniques. Experience in Developer tools such as Azure DevOps, Visual Studio Team Server, Git, Jenkins, etc. Experience with private and public cloud architectures, pros/cons, and migration considerations. Excellent problem-solving, analytical, and critical thinking skills. Ability to manage multiple projects simultaneously, while maintaining a high level of attention to detail. Communication Skills: Must be able to communicate with both technical and nontechnical. Able to derive technical requirements with the stakeholders. Preferred Technical And Professional Experience Cloud platform certification, e.g., Microsoft Certified: (DP-700) Azure Data Engineer Associate, AWS Certified Data Analytics – Specialty, Elastic Certified Engineer, Google Cloud Professional Data Engineer Professional certification, e.g., Open Certified Technical Specialist with Data Engineering Specialization. Experience working with EPIC healthcare systems (e.g., Clarity, Caboodle). Databricks certifications (e.g., Databricks Certified Data Engineer Associate or Professional). Knowledge of GenAI tools, Microsoft Fabric, or Microsoft Copilot. Familiarity with healthcare data standards and compliance (e.g., HIPAA, GDPR). Experience with DevSecOps and CI/CD deployments Experience in NoSQL databases design Knowledge on , Gen AI fundamentals and industry supporting use cases. Hands-on experience with Delta Lake and Delta Tables within the Databricks environment for building scalable and reliable data pipelines. Being You Diversity is a whole lot more than what we look like or where we come from, it’s how we think and who we are. We welcome people of all cultures, backgrounds, and experiences. But we’re not doing it single-handily: Our Kyndryl Inclusion Networks are only one of many ways we create a workplace where all Kyndryls can find and provide support and advice. This dedication to welcoming everyone into our company means that Kyndryl gives you – and everyone next to you – the ability to bring your whole self to work, individually and collectively, and support the activation of our equitable culture. That’s the Kyndryl Way. What You Can Expect With state-of-the-art resources and Fortune 100 clients, every day is an opportunity to innovate, build new capabilities, new relationships, new processes, and new value. Kyndryl cares about your well-being and prides itself on offering benefits that give you choice, reflect the diversity of our employees and support you and your family through the moments that matter – wherever you are in your life journey. Our employee learning programs give you access to the best learning in the industry to receive certifications, including Microsoft, Google, Amazon, Skillsoft, and many more. Through our company-wide volunteering and giving platform, you can donate, start fundraisers, volunteer, and search over 2 million non-profit organizations. At Kyndryl, we invest heavily in you, we want you to succeed so that together, we will all succeed. Get Referred! If you know someone that works at Kyndryl, when asked ‘How Did You Hear About Us’ during the application process, select ‘Employee Referral’ and enter your contact's Kyndryl email address.

Posted 6 days ago

Apply

4.0 years

0 Lacs

Kochi, Kerala, India

On-site

Linkedin logo

Introduction In this role, you'll work in one of our IBM Consulting Client Innovation Centers (Delivery Centers), where we deliver deep technical and industry expertise to a wide range of public and private sector clients around the world. Our delivery centers offer our clients locally based skills and technical expertise to drive innovation and adoption of new technology. Your Role And Responsibilities As Data Engineer, you will develop, maintain, evaluate and test big data solutions. You will be involved in the development of data solutions using Spark Framework with Python or Scala on Hadoop and AWS Cloud Data Platform Responsibilities: Experienced in building data pipelines to Ingest, process, and transform data from files, streams and databases. Process the data with Spark, Python, PySpark, Scala, and Hive, Hbase or other NoSQL databases on Cloud Data Platforms (AWS) or HDFS Experienced in develop efficient software code for multiple use cases leveraging Spark Framework / using Python or Scala and Big Data technologies for various use cases built on the platform Experience in developing streaming pipelines Experience to work with Hadoop / AWS eco system components to implement scalable solutions to meet the ever-increasing data volumes, using big data/cloud technologies Apache Spark, Kafka, any Cloud computing etc Preferred Education Master's Degree Required Technical And Professional Expertise Minimum 4+ years of experience in Big Data technologies with extensive data engineering experience in Spark / Python or Scala ; Minimum 3 years of experience on Cloud Data Platforms on AWS; Experience in AWS EMR / AWS Glue / Data Bricks, AWS RedShift, DynamoDB Good to excellent SQL skills Exposure to streaming solutions and message brokers like Kafka technologies Preferred Technical And Professional Experience Certification in AWS and Data Bricks or Cloudera Spark Certified developers Show more Show less

Posted 6 days ago

Apply

175.0 years

0 Lacs

Gurugram, Haryana, India

On-site

Linkedin logo

At American Express, our culture is built on a 175-year history of innovation, shared values and Leadership Behaviors, and an unwavering commitment to back our customers, communities, and colleagues. As part of Team Amex, you'll experience this powerful backing with comprehensive support for your holistic well-being and many opportunities to learn new skills, develop as a leader, and grow your career. Here, your voice and ideas matter, your work makes an impact, and together, you will help us define the future of American Express. How will you make an impact in this role? The Digital Data Strategy Team within the broader EDEA (Enterprise Digital Experimentation & Analytics) in EDDS supports all other EDEA VP teams and product & marketing partner teams with data strategy, automation & insights and creates and manages automated insight packs and multiple derived data layers. The team partners with Technology to enable end to end MIS Automation, ODL(Organized Data Layer) creation, drives process automation, optimization, Data & MIS Quality in an efficient manner. The team also supports strategic Data & Platform initiatives. This role will report to the Director – Digital Data Strategy, EDEA and will be based in Gurgaon. The candidate will be responsible for delivery of high impactful data and automated insights products to enable other analytics partners, marketing partners and product owners to optimize across our platform, demand generation, acquisition and membership experience domains. Your responsibilities include: Elevate Data Intelligence: Set vision for Intuitive, integrated and intelligent frameworks to enable smart Insights. Discover new sources of information for strong enrichment of business applications. Modernization: Keep up with the latest industry research and emerging technologies to ensure we are appropriately leveraging new techniques and capabilities and drive strategic change in tools & capabilities. Develop roadmap to transition our analytical and production usecases to the cloud platform and develop next generation MIS products through modern full stack BI tools & enable self-serve analytics Define digital data strategy vision as the business owner of digital analytics data & partner to achieve the vision of Data as a Service to enable Unified, Scalable & Secure data assets for business applications Strong understanding of key drivers & dynamics of Digital Data, Data Architecture & Design, Data Linkage & Usages. In depth knowledge of platforms like Big Data/Cornerstone, Lumi/Google Cloud Platform, Data Ingestion and Organized Data Layers. Being abreast of the latest industry & enterprise wide data governance, data quality practices, privacy policies and engrain the same in all data products & capabilities and be a guiding light for broader team. Partner and collaborate with multiple partners, agency & colleagues to develop Capabilities that will help in maximizing demand generation program ROI. Lead and develop a highly engaged team with a diverse skill-set to deliver automated digital & data solutions Minimum Qualifications 5+ years with relevant experience in the Automation, Data Product Management/Data Strategy with adequate data quality, economies of scale and process governance Proven thought leadership, Solid project management skills, strong communication, collaboration, relationship and conflict management skills Bachelors or Master’s degree in Engineering/Management Knowledge of Big Data oriented tools (e.g. Big query, Hive, SQL, Python/R, PySpark); Advanced Excel/VBA and PowerPoint; Experience of managing complex processes and integration with upstream and downstream systems/processes. Hands on experience on visualization tools like Tableau, Power BI, Sisense etc. Preferred Qualifications Strong analytical/conceptual thinking competence to solve unstructured and complex business problems and articulate key findings to senior leaders/partners in a succinct and concise manner. Strong understanding of internal platforms like Big Data/Cornerstone, Lumi/Google Cloud Platform. Knowledge of Agile tools and methodologies Enterprise Leadership Behaviors: Set the Agenda: Define What Winning Looks Like, Put Enterprise Thinking First, Lead with an External Perspective Bring Others with You: Build the Best Team, Seek & Provide Coaching Feedback, Make Collaboration Essential Do It the Right Way: Communicate Frequently, Candidly & Clearly, Make Decisions Quickly & Effectively, Live the Blue Box Values, Great Leadership Demands Courage We back you with benefits that support your holistic well-being so you can be and deliver your best. This means caring for you and your loved ones' physical, financial, and mental health, as well as providing the flexibility you need to thrive personally and professionally: Competitive base salaries Bonus incentives Support for financial-well-being and retirement Comprehensive medical, dental, vision, life insurance, and disability benefits (depending on location) Flexible working model with hybrid, onsite or virtual arrangements depending on role and business need Generous paid parental leave policies (depending on your location) Free access to global on-site wellness centers staffed with nurses and doctors (depending on location) Free and confidential counseling support through our Healthy Minds program Career development and training opportunities American Express is an equal opportunity employer and makes employment decisions without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran status, disability status, age, or any other status protected by law. Offer of employment with American Express is conditioned upon the successful completion of a background verification check, subject to applicable laws and regulations. Show more Show less

Posted 6 days ago

Apply

9.0 - 14.0 years

9 - 24 Lacs

Visakhapatnam

Work from Office

Naukri logo

Responsibilities: * Design, develop & maintain data pipelines using PySpark, SQL & DBs. * Collaborate with cross-functional teams on project delivery. *Strong in Databricks, PySpark, SQL * Databricks certification is mandatory *Location: Remote

Posted 6 days ago

Apply

10.0 - 15.0 years

22 - 37 Lacs

Bengaluru

Work from Office

Naukri logo

Who We Are At Kyndryl, we design, build, manage and modernize the mission-critical technology systems that the world depends on every day. So why work at Kyndryl? We are always moving forward – always pushing ourselves to go further in our efforts to build a more equitable, inclusive world for our employees, our customers and our communities. The Role At Kyndryl, we design, build, manage and modernize the mission-critical technology systems that the world depends on every day. So why work at Kyndryl? We are always moving forward – always pushing ourselves to go further in our efforts to build a more equitable, inclusive world for our employees, our customers and our communities. As GCP Data Engineer at Kyndryl, you will be responsible for designing and developing data pipelines, participating in architectural discussions, and implementing data solutions in a cloud environment using GCP data services. You will collaborate with global architects and business teams to design and deploy innovative solutions, supporting data analytics, automation, and transformation needs. Responsibilities: Design, develop, and maintain scalable data pipelines using GCP services such as BigQuery, Dataflow, Pub/Sub, and Cloud Storage. Participate in architectural discussions, conduct system analysis, and suggest optimal solutions that are scalable, future-proof, and aligned with business requirements. Collaborate with stakeholders to gather requirements and create high-level and detailed technical designs. Design data models suitable for both transactional and big data environments, supporting Machine Learning workflows. Build and optimize ETL/ELT infrastructure using a variety of data sources and GCP services. Develop and maintain Python / PySpark for data processing and integrate with GCP services for seamless data operations. Develop and optimize SQL queries for data analysis and reporting. Monitor and troubleshoot data pipeline issues to ensure timely resolution. Implement data governance and security best practices within GCP. Perform data quality checks and validation to ensure accuracy and consistency. Support DevOps automation efforts to ensure smooth integration and deployment of data pipelines. Provide design expertise in Master Data Management (MDM), Data Quality, and Metadata Management. Provide technical support and guidance to junior data engineers and other team members. Participate in code reviews and contribute to continuous improvement of data engineering practices. Implement best practices for cost management and resource utilization within GCP. If you're ready to embrace the power of data to transform our business and embark on an epic data adventure, then join us at Kyndryl. Together, let's redefine what's possible and unleash your potential. Your Future at Kyndryl Every position at Kyndryl offers a way forward to grow your career. We have opportunities that you won’t find anywhere else, including hands-on experience, learning opportunities, and the chance to certify in all four major platforms. Whether you want to broaden your knowledge base or narrow your scope and specialize in a specific sector, you can find your opportunity here. Who You Are You’re good at what you do and possess the required experience to prove it. However, equally as important – you have a growth mindset; keen to drive your own personal and professional development. You are customer-focused – someone who prioritizes customer success in their work. And finally, you’re open and borderless – naturally inclusive in how you work with others. Required Technical and Professional Experience: Bachelor’s or master’s degree in computer science, Engineering, or a related field with over 8 years of experience in data engineering More than 3 years of experience with the GCP data ecosystem Hands-on experience and Strong proficiency in GCP components such as Dataflow, Dataproc, BigQuery, Cloud Functions, Composer, Data Fusion. Excellent command of SQL with the ability to write complex queries and perform advanced data transformation. Strong programming skills in PySpark and/or Python, specifically for building cloud-native data pipelines. Familiarity with GCP tools like Looker, Airflow DAGs, Data Studio, App Maker, etc. Hands-on experience implementing enterprise-wide cloud data lake and data warehouse solutions on GCP. Knowledge of data governance, security, and compliance best practices. Experience with private and public cloud architectures, pros/cons, and migration considerations. Excellent problem-solving, analytical, and critical thinking skills. Ability to manage multiple projects simultaneously, while maintaining a high level of attention to detail. Communication Skills: Must be able to communicate with both technical and nontechnical. Able to derive technical requirements with the stakeholders. Ability to work independently and in agile teams. Preferred Technical And Professional Experience GCP Data Engineer Certification is highly preferred. Professional certification, e.g., Open Certified Technical Specialist with Data Engineering Specialization. Experience working as a Data Engineer and/or in cloud modernization. Knowledge of Databricks, Snowflake, for data analytics. Experience in NoSQL databases Familiarity with containerization and orchestration tools (e.g., Docker, Kubernetes). Familiarity with BI dashboards and Google Data Studio is a plus. Being You Diversity is a whole lot more than what we look like or where we come from, it’s how we think and who we are. We welcome people of all cultures, backgrounds, and experiences. But we’re not doing it single-handily: Our Kyndryl Inclusion Networks are only one of many ways we create a workplace where all Kyndryls can find and provide support and advice. This dedication to welcoming everyone into our company means that Kyndryl gives you – and everyone next to you – the ability to bring your whole self to work, individually and collectively, and support the activation of our equitable culture. That’s the Kyndryl Way. What You Can Expect With state-of-the-art resources and Fortune 100 clients, every day is an opportunity to innovate, build new capabilities, new relationships, new processes, and new value. Kyndryl cares about your well-being and prides itself on offering benefits that give you choice, reflect the diversity of our employees and support you and your family through the moments that matter – wherever you are in your life journey. Our employee learning programs give you access to the best learning in the industry to receive certifications, including Microsoft, Google, Amazon, Skillsoft, and many more. Through our company-wide volunteering and giving platform, you can donate, start fundraisers, volunteer, and search over 2 million non-profit organizations. At Kyndryl, we invest heavily in you, we want you to succeed so that together, we will all succeed. Get Referred! If you know someone that works at Kyndryl, when asked ‘How Did You Hear About Us’ during the application process, select ‘Employee Referral’ and enter your contact's Kyndryl email address.

Posted 6 days ago

Apply

7.0 - 10.0 years

0 Lacs

Vadodara, Gujarat, India

On-site

Linkedin logo

We’re reinventing the market research industry. Let’s reinvent it together. At Numerator, we believe tomorrow’s success starts with today’s market intelligence. We empower the world’s leading brands and retailers with unmatched insights into consumer behavior and the influencers that drive it. We are seeking a highly skilled Technical Delivery Lead - Data Engineer with extensive experience in analysing existing data/databases, designing, building, and optimizing high-volume data pipelines. The ideal candidate will have strong expertise in Python, Databases, Databricks on Azure Cloud services, DevOps, and CI/CD tools, along with a solid understanding of AI/ML techniques and big data processing frameworks like Apache Spark and PySpark. Responsibilities Adhere to coding and Numerator technology standards Build suitable automation test suites within Azure DevOps Maintain and update automation test suites as required Carry out manual testing, load testing, exploratory testing as required Perform Technical Analysis and work closely with Business Analysts and Senior Data Developers to consistently deliver sprint goals Assist in estimation of sprint-by-sprint stories and tasks Pro-actively take a responsible approach to product delivery What You'll Bring to Numerator 7-10 years of experience in data engineering roles, handling large databases Good C# and Python skills Experience working with Microsoft Azure Cloud Experience in Agile methodologies (Scrum/Kanban) Experience with Apache Spark, PySpark, Databricks Experience working with Devops pipeline, preferably Azure DevOps Preferred Qualifications Bachelor's or master's degree in computer science, Information Technology, Data Science, or a related field. Experience working in a technical development/support focused role Knowledge/experience in AI ML techniques Knowledge/experience in Visual Basic 6 Certification in relevant Data Engineering discipline or related fields. Show more Show less

Posted 6 days ago

Apply

Exploring PySpark Jobs in India

PySpark, a powerful data processing framework built on top of Apache Spark and Python, is in high demand in the job market in India. With the increasing need for big data processing and analysis, companies are actively seeking professionals with PySpark skills to join their teams. If you are a job seeker looking to excel in the field of big data and analytics, exploring PySpark jobs in India could be a great career move.

Top Hiring Locations in India

Here are 5 major cities in India where companies are actively hiring for PySpark roles: 1. Bangalore 2. Pune 3. Hyderabad 4. Mumbai 5. Delhi

Average Salary Range

The estimated salary range for PySpark professionals in India varies based on experience levels. Entry-level positions can expect to earn around INR 6-8 lakhs per annum, while experienced professionals can earn upwards of INR 15 lakhs per annum.

Career Path

In the field of PySpark, a typical career progression may look like this: 1. Junior Developer 2. Data Engineer 3. Senior Developer 4. Tech Lead 5. Data Architect

Related Skills

In addition to PySpark, professionals in this field are often expected to have or develop skills in: - Python programming - Apache Spark - Big data technologies (Hadoop, Hive, etc.) - SQL - Data visualization tools (Tableau, Power BI)

Interview Questions

Here are 25 interview questions you may encounter when applying for PySpark roles:

  • Explain what PySpark is and its main features (basic)
  • What are the advantages of using PySpark over other big data processing frameworks? (medium)
  • How do you handle missing or null values in PySpark? (medium)
  • What is RDD in PySpark? (basic)
  • What is a DataFrame in PySpark and how is it different from an RDD? (medium)
  • How can you optimize performance in PySpark jobs? (advanced)
  • Explain the difference between map and flatMap transformations in PySpark (basic)
  • What is the role of a SparkContext in PySpark? (basic)
  • How do you handle schema inference in PySpark? (medium)
  • What is a SparkSession in PySpark? (basic)
  • How do you join DataFrames in PySpark? (medium)
  • Explain the concept of partitioning in PySpark (medium)
  • What is a UDF in PySpark? (medium)
  • How do you cache DataFrames in PySpark for optimization? (medium)
  • Explain the concept of lazy evaluation in PySpark (medium)
  • How do you handle skewed data in PySpark? (advanced)
  • What is checkpointing in PySpark and how does it help in fault tolerance? (advanced)
  • How do you tune the performance of a PySpark application? (advanced)
  • Explain the use of Accumulators in PySpark (advanced)
  • How do you handle broadcast variables in PySpark? (advanced)
  • What are the different data sources supported by PySpark? (medium)
  • How can you run PySpark on a cluster? (medium)
  • What is the purpose of the PySpark MLlib library? (medium)
  • How do you handle serialization and deserialization in PySpark? (advanced)
  • What are the best practices for deploying PySpark applications in production? (advanced)

Closing Remark

As you explore PySpark jobs in India, remember to prepare thoroughly for interviews and showcase your expertise confidently. With the right skills and knowledge, you can excel in this field and advance your career in the world of big data and analytics. Good luck!

cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies