Home
Jobs

3895 Pyspark Jobs - Page 36

Filter Interviews
Min: 0 years
Max: 25 years
Min: β‚Ή0
Max: β‚Ή10000000
Setup a job Alert
Filter
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

6.0 - 11.0 years

14 - 19 Lacs

Gurugram

Work from Office

Naukri logo

JOB OBJECTIVE Manager with good hands-on experience of 6+ years in developing state of the art and scalable Machine Learning models and their operationalization, leveraging off-the-shelf workbench production. KEY RESPONSIBILITIES Necessary Skills – 6+ years of experience of model development using Python/PySpark libraries. Development on Databricks or Dataiku DSS (Data Science Studio) environment would be a plus Strong experience on Spark with Scala/Python/Java Strong proficiency in building/training/evaluating state of the art machine learning models and its deployment Proficiency in Statistical and Probabilistic methods such as SVM, Decision-Trees, Bagging and Boosting Techniques, Clustering Proficiency in Core NLP techniques like Text Classification, Named Entity Recognition (NER), Topic Modeling, Sentiment Analysis, etc. Understanding of Generative AI / Large Language Models / Transformers would be a plus Hands on experience in Python data-science and math packages such as NumPy, Pandas, Sklearn, Seaborn, PyCaret, Matplotlib Proficiency in Python and common Machine Learning frameworks (TensorFlow, NLTK, Stanford NLP, PyTorch, Ling Pipe, Caffe, Keras, SparkML and OpenAI etc.) Experience of working in large teams and using collaboration tools like GIT, Jira and Confluence Good understanding of any of the cloud platform – AWS, Azure or GCP Understanding of Commercial Pharma landscape and Patient Data / Analytics would be a huge plus Should have an attitude of willingness to learn, accepting the challenging environment and confidence in delivering the results within timelines. Should be inclined towards self motivation and self-driven to find solutions for problems. Should be able to mentor and guide mid to large sized teams under him/her

Posted 1 week ago

Apply

5.0 - 10.0 years

11 - 15 Lacs

Gurugram

Work from Office

Naukri logo

Position Summary This is the Requisition for Employee Referrals Campaign and JD is Generic. We are looking for Associates with 5+ years of experience in delivering solutions around Data Engineering, Big data analytics and data lakes, MDM, BI, and data visualization. Experienced to Integrate and standardize structured and unstructured data to enable faster insights using cloud technology. Enabling data-driven insights across the enterprise. Job Responsibilities He/she should be able to design implement and deliver complex Data Warehousing/Data Lake, Cloud Data Management, and Data Integration project assignments. Technical Design and Development – Expertise in any of the following skills. Any ETL tools (Informatica, Talend, Matillion, Data Stage), andhosting technologies like the AWS stack (Redshift, EC2) is mandatory. Any BI toolsamong Tablau, Qlik & Power BI and MSTR. Informatica MDM, Customer Data Management. Expert knowledge of SQL with the capability to performance tune complex SQL queries in tradition and distributed RDDMS systems is must. Experience across Python, PySpark and Unix/Linux Shell Scripting. Project Managementis must to have. Should be able create simple to complex project plans in Microsoft Project Plan and think in advance about potential risks and mitigation plans as per project plan. Task Management – Should be able to onboard team on the project plan and delegate tasks to accomplish milestones as per plan. Should be comfortable in discussing and prioritizing work items with team members in an onshore-offshore model. Handle Client Relationship – Manage client communication and client expectations independently or with support of reporting manager. Should be able to deliver results back to the Client as per plan. Should have excellent communication skills. Education Bachelor of Technology Master's Equivalent - Engineering Work Experience Overall, 5- 7years of relevant experience inData Warehousing, Data management projects with some experience in the Pharma domain. We are hiring for following roles across Data management tech stacks - ETL toolsamong Informatica, IICS/Snowflake,Python& Matillion and other Cloud ETL. BI toolsamong Power BI and Tableau. MDM - Informatica/ Raltio, Customer Data Management. Azure cloud Developer using Data Factory and Databricks Data Modeler-Modelling of data - understanding source data, creating data models for landing, integration. Python/PySpark -Spark/ PySpark Design, Development, and Deployment

Posted 1 week ago

Apply

2.0 - 4.0 years

8 - 12 Lacs

Noida

Work from Office

Naukri logo

Position Summary This role will be responsible for in-patient journey analysis and working with patient-level data to develop a robust solution for the client's teams. An expert in Patient Analytics who can guide and lead the team supporting pharma clients Job Responsibilities Effectively manage the client/ onshore stakeholders, as per the business needs, to ensure successful business delivery. Work closely with the project manager to define the algorithm, break down the problem into execution steps, and run the analysis Ensure high-quality analytics solutions/reports to the client Delivery role will include project scoping, solution design, execution, and communication of the analysis in the client-ready formats Contribute towards Axtria tools and capabilities as per the business requirements. Build organization capabilities by participating in Hackathon, solution design, and process automation Effectively communicate with onshore/ client (as per business needs) Education BE/B.Tech in IT or Computer Master Diploma - Business Administration in Business Administration Work Experience Overall, 3-5years of rich experience in the Pharmaceutical / Life Sciences Domain. We are looking for experts in the space of commercial pharmaceutical analytics- HCP analytics, payer analytics, and patient analytics. Worked on advanced analytics in the pharma domain throughout the patient journey, like the line of therapy, switch analysis, source of business, segmentation, persistence & compliance, adherence, and patient identification, etc, using various data sources Experience using various patient-level data like APLD, LAAD, EMR, patient registries, Prescription data, formulary data, etc Strong in logical reasoning, structuring of analysis, asking the right questions, and logical approach to analyze data, problems, and situations. Experience in pharmaceutical sales and marketing analytics would be preferred Relevant experience in Statistical/ modeling knowledge, ability to transform data to insights, good data visualization/ reporting skills Good to have work experience in building statistical modeling and/or AI/ML models using Python, R-Studio, PySpark, Keras, and TensorFlow. Technical knowledge- R/ Python/ SQL. Knowledge of self-service analytics platforms such as DataiKU/ KNIME/ Alteryx will be an added advantage. MS Excel knowledge is mandatory. Behavioural Competencies Teamwork & Leadership Motivation to Learn and Grow Ownership Cultural Fit Project Management Communication Technical Competencies Python R SQL EXCEL Pharma Commercial Know How Others Patient Data Analytics

Posted 1 week ago

Apply

4.0 - 6.0 years

8 - 12 Lacs

Noida

Work from Office

Naukri logo

Position Summary To be a driven business analyst who can work on complex Analytical problems and help the customer in better business decision making especially in the area of Pharma (domain). Job Responsibilities Effectively manage the client/ onshore stakeholders, as per the business needs, to ensure successful business delivery. Work closely with the project manager to define the algorithm, break down the problem into execution steps, and run the analysis Ensure high-quality analytics solutions/reports to the client Delivery role will include project scoping, solution design, execution, and communication of the analysis in the client-ready formats Contribute towards Axtria tools and capabilities as per the business requirements. Build organization capabilities by participating in Hackathon, solution design, and process automation Effectively communicate with onshore/ client (as per business needs) Education Bachelor of Engineering in Statistics Work Experience Overall, 4-6 years of rich experience in the Pharmaceutical / Life Sciences Domain. We are looking for experts in the space of commercial pharmaceutical analytics- HCP analytics, payer analytics, and patient analytics. Worked on advanced analytics in the pharma domain throughout the patient journey like the line of therapy, switch analysis, source of business, segmentation, persistence & compliance, adherence, and patient identification, etc using various data sources Experience using various patient-level data like APLD,LAAD, EMR, patient registries, Prescription data, formulary data, etc Can work across a variety of projects from advanced analytics, ad-hoc analysis, and reporting Effectively communicate with onshore/ client (as per business needs) Relevant experience in Statistical/ modeling knowledge, ability to transform data to insights, good data visualization/ reporting skills Good to have work experience in building statistical modeling and/or AI/ML models using Python, R-Studio, PySpark, Keras, and TensorFlow. Technical knowledge- R/ Python/ SQL. Knowledge of self-service analytics platforms such as DataiKU/ KNIME/ Alteryx will be an added advantage. MS Excel knowledge is mandatory. Behavioural Competencies Teamwork & Leadership Motivation to Learn and Grow Ownership Cultural Fit Project Management Communication Technical Competencies Python R SQL EXCEL MMx Forecasting Machine Learning Pharma Commercial Know How HEOR EPI and Economic Analysis HEOR Simulation Analysis Patient Data Analytics Know How Dataiku KNIME Others

Posted 1 week ago

Apply

125.0 years

0 Lacs

Mumbai Metropolitan Region

On-site

Linkedin logo

JOB DESCRIPTION β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€” Manager / DGM - Data Platform Godrej Consumer Products Limited (GCPL) Mumbai, Maharashtra, India β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€” Job Title: Manager / DGM - Data Platform Job Type: Permanent, Full-time Function: Information Technology Business: Godrej Consumer Products Limited Location: Mumbai, Maharashtra, India About Godrej Industries Limited and Associate Companies (GILAC) GILAC is a holding company of the Godrej Group. We have significant interests in consumer goods, real estate, agriculture, chemicals, and financial services through our subsidiary and associate companies, across 18 countries. https://www.godrejindustries.com/ About Godrej Consumer Products Limited (GCPL) Godrej Consumer Products is a leading emerging markets company. As part of the over 125-year young Godrej Group, we are fortunate to have a proud legacy built on the strong values of trust, integrity and respect for others. At the same time, we are growing fast and have exciting, ambitious aspirations. https://www.godrejcp.com/ About the role This role holder will act as a Data Engineering Project Lead. The role holder is responsible for implementation and support for data engineering projects (primarily on Microsoft Azure platform) through our partner eco system for our businesses globally. The responsibility also includes evaluation and implementation of new features and products like Gen AI etc. of Azure data platform and driving standardization of Azure technology stack and data engineering and coding best practices for Azure projects. Key Responsibilities Designing and implementing scalable and secure data processing pipelines using Azure Data Factory, Azure Databricks, and other Azure services. Managing and optimizing data storage using Azure Data Lake Storage, Azure SQL Data Warehouse. Developing data models and maintaining data architecture to support data analytics and business intelligence reporting. Ensuring data quality and consistency through data cleaning, transformation, and integration processes. Monitoring and troubleshooting data-related issues within the Azure environment to maintain high availability and performance. Collaborating with data scientists, business analysts, and other stakeholders to understand data requirements and implement appropriate data solutions. Implementing data security measures, including encryption, access controls, and auditing, to protect sensitive information. Automating data pipelines and workflows to streamline data ingestion, processing, and distribution tasks. Utilizing Azure's analytics services, such as Azure Synapse Analytics, to provide insights and support data-driven decision-making. Keeping abreast of the latest Azure features and technologies to enhance data engineering processes and capabilities. Documenting data procedures, systems, and architectures to maintain clarity and ensure compliance with regulatory standards. Providing guidance and support for data governance, including metadata management, data lineage, and data cataloging. Who are we looking for? Education: BE / B-Tech in Computer Science from a premier institute MBA is preferred Azure Cloud Data Engineering Certifications Experience: 10 years of overall exp and at least 5 years exp in Azure Data Engineering Skills: Azure Data Factory and Data Pipeline Orchestration Azure Databricks and Big Data Processing Azure Synapse Analytics and Data Warehousing Data Modeling and Database Design SQL and NoSQL Database Technologies Data Lake Storage and Management Power BI and Data Visualization [Optional] Machine Learning and AI Integration with Azure ML Python, Pyspark, PySQL [Spark] Programming Data Security and Compliance within Azure What’s in it for you? Be an equal parent Maternity support, including paid leave ahead of statutory guidelines, and flexible work options on return Paternity support, including paid leave New mothers can bring a caregiver and children under a year old, on work travel Adoption support; gender neutral and based on the primary caregiver, with paid leave options No place for discrimination at Godrej Gender-neutral anti-harassment policy Same sex partner benefits at par with married spouses Gender transition support We are selfish about your wellness Comprehensive health insurance plans, as well as accident coverage for you and your family, with top-up options Uncapped sick leave Mental wellness and self-care programmes, resources and counselling Celebrating wins, the Godrej Way Structured recognition platforms for individual, team and business-level achievements Performance-based earning opportunities https://www.godrejcareers.com/benefits/ An inclusive Godrej Before you go, there is something important we want to highlight. There is no place for discrimination at Godrej. Diversity is the philosophy of who we are as a company. And has been for over a century. It’s not just in our DNA and nice to do. Being more diverse - especially having our team members reflect the diversity of our businesses and communities - helps us innovate better and grow faster. We hope this resonates with you. We take pride in being an equal opportunities employer. We recognise merit and encourage diversity. We do not tolerate any form of discrimination on the basis of nationality, race, colour, religion, caste, gender identity or expression, sexual orientation, disability, age, or marital status and ensure equal opportunities for all our team members. If this sounds like a role for you, apply now! We look forward to meeting you. Show more Show less

Posted 1 week ago

Apply

4.0 years

0 Lacs

Bengaluru, Karnataka, India

On-site

Linkedin logo

The Business Analytics Analyst is a developing professional role. Applies specialty area knowledge in monitoring, assessing, analyzing and/or evaluating processes and data. Identifies policy gaps and formulates policies. Interprets data and makes recommendations. Researches and interprets factual information. Identifies inconsistencies in data or results, defines business issues and formulates recommendations on policies, procedures or practices. Integrates established disciplinary knowledge within own specialty area with basic understanding of related industry practices. Good understanding of how the team interacts with others in accomplishing the objectives of the area. Develops working knowledge of industry practices and standards. Limited but direct impact on the business through the quality of the tasks/services provided. Impact of the job holder is restricted to own team. What do we do? The TTS Analytics team provides analytical insights to the Product, Pricing, Client Experience and Sales functions within the global Treasury & Trade Services business. The team works on business problems focused on driving acquisitions, cross-sell, revenue growth & improvements in client experience. The team extracts relevant insights, identifies business opportunities, converts business problems into analytical frameworks, uses big data tools and AI/ML techniques to drive data driven business outcomes in collaboration with business and product partners. Role Description The role will be Business Analytics Analyst (C10) in the TTS Analytics team The role will report to the AVP leading the team This role will be a key contributor to ideation on analytical projects to tackle strategic business priorities The role will involve working on multiple analyses through the year on business problems across the client life cycle – acquisition, engagement, client experience and retention – for the TTS business This will involve leveraging multiple analytical approaches, tools and techniques, working on multiple data sources (client profile & engagement data, transactions & revenue data, digital data, unstructured data like call transcripts, etc.) to provide data driven insights to business partners and functional stakeholders Identifies data patterns & trends, and provides insights to enhance business decision making capability in business planning, process improvement, solution assessment etc. Recommends actions for future developments & strategic business opportunities, as well as enhancements to operational policies Translate data into consumer or customer behavioral insights to drive targeting and segmentation strategies, and communicate clearly and effectively to business partners and senior leaders all findings Continuously improve processes and strategies by exploring and evaluating new data sources, tools, and capabilities Work closely with internal business partners in building, implementing, tracking and improving decision strategies Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency. Qualifications: Experience: Bachelor’s Degree with 4+ years of experience in data analytics or Masters Degree with 2+ years of experience in data analytics Identifying and resolving business problems (around sales/marketing strategy optimization, pricing optimization, client experience, cross-sell and retention) preferably in the financial services industry Utilizing text data to derive business value by leveraging different NLP techniques (mid to expert level prior experience is a must) Leveraging and developing analytical tools and methods to identify patterns, trends and outliers in data Applying Predictive Modeling techniques for a wide range of business problem Working with data from different sources, with different complexities, both structured and unstructured Skills: Analytical Skills: Strong logical reasoning and problem solving ability Proficient in converting business problems into analytical tasks, and analytical findings into business insights Proficient in formulating analytical methodology, identifying trends and patterns with data Has the ability to work hands-on to retrieve and manipulate data from big data environments Tools and Platforms: Expert knowledge in Python, SQL, PySpark and related tools Proficient in MS Excel, PowerPoint This job description provides a high-level review of the types of work performed. Other job-related duties may be assigned as required. ------------------------------------------------------ Job Family Group: Decision Management ------------------------------------------------------ Job Family: Business Analysis ------------------------------------------------------ Time Type: Full time ------------------------------------------------------ Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law. If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi. View Citi’s EEO Policy Statement and the Know Your Rights poster. Show more Show less

Posted 1 week ago

Apply

4.0 years

0 Lacs

Hyderabad, Telangana, India

On-site

Linkedin logo

We are seeking a skilled and motivated Data Engineer to join our dynamic team. The ideal candidate will have experience in designing, developing, and maintaining scalable data pipelines and architectures using Hadoop, PySpark, ETL processes , and Cloud technologies . Role: Senior Data Engineer Experience: 4-8 years Job locations: Coimbatore, Chennai, Bangalore, Hyderabad Responsibilities Design, develop, and maintain data pipelines for processing large-scale datasets. Build efficient ETL workflows to transform and integrate data from multiple sources. Develop and optimize Hadoop and PySpark applications for data processing. Ensure data quality, governance, and security standards are met across systems. Implement and manage Cloud-based data solutions (AWS, Azure, or GCP). Collaborate with data scientists and analysts to support business intelligence initiatives. Troubleshoot performance issues and optimize query executions in big data environments. Stay updated with industry trends and advancements in big data and cloud technologies. Required Skills Strong programming skills in Python, Scala, or Java. Hands-on experience with Hadoop ecosystem (HDFS, Hive, Spark, etc.). Expertise in PySpark for distributed data processing. Proficiency in ETL tools and workflows (SSIS, Apache Nifi, or custom pipelines). Experience with Cloud platforms (AWS, Azure, GCP) and their data-related services. Knowledge of SQL and NoSQL databases. Familiarity with data warehousing concepts and data modeling techniques. Strong analytical and problem-solving skills. Interested can contact us at +91 7305206696/ saranyadevib@talentien.com Skills: sql,data warehousing,aws,cloud,hadoop,scala,java,python,data engineering,azure,cloud technologies (aws, azure, gcp),etl processes,data modeling,nosql,pyspark,etl Show more Show less

Posted 1 week ago

Apply

6.0 - 10.0 years

10 - 20 Lacs

Chennai, Bengaluru

Hybrid

Naukri logo

Hi Work Location : Chennai AND Bangalore Work location : Imm - 30 days Primary: Azure Databricks,ADF, Pyspark SQL Sharing JD for your reference : Overall, 6-12 yrs of IT experience preferably in cloud Min 4 years in Azure Databricks on development projects Should be 100% hands on in Pyspark coding Should have strong SQL expertise in writing advanced/complex SQL queries DWH experience is a must for this role Experience in programming using Python is an advantage Experience in data ingestion, preparation, integration, and operationalization techniques in optimally addressing the data requirements Should be able to understand system architecture which involves Data Lakes, Data Warehouses and Data Marts Experience to own end-to-end development, including coding, testing, debugging and deployment Excellent communication is required for this role Kindly, share the following details : Updated CV Relevant Skills Total Experience Current Company Current CTC Expected CTC Notice Period Current Location Preferred Location

Posted 1 week ago

Apply

7.0 - 12.0 years

18 - 30 Lacs

Bengaluru

Work from Office

Naukri logo

Urgently Hiring for Senior Azure Data Engineer Job Location- Bangalore Minimum exp - Total 7+yrs with min 4 years relevant exp Keywords Databricks, Pyspark, SCALA, SQL, Live / Streaming data, batch processing data Share CV siddhi.pandey@adecco.com OR Call 6366783349 Roles and Responsibilities: The Data Engineer will work on data engineering projects for various business units, focusing on delivery of complex data management solutions by leveraging industry best practices. They work with the project team to build the most efficient data pipelines and data management solutions that make data easily available for consuming applications and analytical solutions. A Data engineer is expected to possess strong technical skills Key Characteristics Technology champion who constantly pursues skill enhancement and has inherent curiosity to understand work from multiple dimensions Interest and passion in Big Data technologies and appreciates the value that can be brought in with an effective data management solution Has worked on real data challenges and handled high volume, velocity, and variety of data. Excellent analytical & problem-solving skills, willingness to take ownership and resolve technical challenges. Contributes to community building initiatives like CoE, CoP. Mandatory skills: Azure - Master ELT - Skill Data Modeling - Skill Data Integration & Ingestion - Skill Data Manipulation and Processing - Skill GITHUB, Action, Azure DevOps - Skill Data factory, Databricks, SQL DB, Synapse, Stream Analytics, Glue, Airflow, Kinesis, Redshift, SonarQube, PyTest - Skill Optional skills: Experience in project management, running a scrum team. Experience working with BPC, Planning. Exposure to working with external technical ecosystem. MKDocs documentation Share CV siddhi.pandey@adecco.com OR Call 6366783349

Posted 1 week ago

Apply

2.0 years

0 Lacs

Hyderabad, Telangana, India

On-site

Linkedin logo

Overview Data Science Team works in developing Machine Learning (ML) and Artificial Intelligence (AI) projects. Specific scope of this role is to develop ML solution in support of ML/AI projects using big analytics toolsets in a CI/CD environment. Analytics toolsets may include DS tools/Spark/Databricks, and other technologies offered by Microsoft Azure or open-source toolsets. This role will also help automate the end-to-end cycle with Azure Pipelines. You will be part of a collaborative interdisciplinary team around data, where you will be responsible of our continuous delivery of statistical/ML models. You will work closely with process owners, product owners and final business users. This will provide you the correct visibility and understanding of criticality of your developments. Responsibilities Delivery of key Advanced Analytics/Data Science projects within time and budget, particularly around DevOps/MLOps and Machine Learning models in scope Active contributor to code & development in projects and services Partner with data engineers to ensure data access for discovery and proper data is prepared for model consumption. Partner with ML engineers working on industrialization. Communicate with business stakeholders in the process of service design, training and knowledge transfer. Support large-scale experimentation and build data-driven models. Refine requirements into modelling problems. Influence product teams through data-based recommendations. Research in state-of-the-art methodologies. Create documentation for learnings and knowledge transfer. Create reusable packages or libraries. Ensure on time and on budget delivery which satisfies project requirements, while adhering to enterprise architecture standards Leverage big data technologies to help process data and build scaled data pipelines (batch to real time) Implement end-to-end ML lifecycle with Azure Databricks and Azure Pipelines Automate ML models deployments Qualifications BE/B.Tech in Computer Science, Maths, technical fields. Overall 2-4 years of experience working as a Data Scientist. 2+ years’ experience building solutions in the commercial or in the supply chain space. 2+ years working in a team to deliver production level analytic solutions. Fluent in git (version control). Understanding of Jenkins, Docker are a plus. Fluent in SQL syntaxis. 2+ years’ experience in Statistical/ML techniques to solve supervised (regression, classification) and unsupervised problems. 2+ years’ experience in developing business problem related statistical/ML modeling with industry tools with primary focus on Python or Pyspark development. Data Science - Hands on experience and strong knowledge of building machine learning models - supervised and unsupervised models. Knowledge of Time series/Demand Forecast models is a plus Programming Skills - Hands-on experience in statistical programming languages like Python, Pyspark and database query languages like SQL Statistics - Good applied statistical skills, including knowledge of statistical tests, distributions, regression, maximum likelihood estimators Cloud (Azure) - Experience in Databricks and ADF is desirable Familiarity with Spark, Hive, Pig is an added advantage Business storytelling and communicating data insights in business consumable format. Fluent in one Visualization tool. Strong communications and organizational skills with the ability to deal with ambiguity while juggling multiple priorities Experience with Agile methodology for team work and analytics β€˜product’ creation. Experience in Reinforcement Learning is a plus. Experience in Simulation and Optimization problems in any space is a plus. Experience with Bayesian methods is a plus. Experience with Causal inference is a plus. Experience with NLP is a plus. Experience with Responsible AI is a plus. Experience with distributed machine learning is a plus Experience in DevOps, hands-on experience with one or more cloud service providers AWS, GCP, Azure(preferred) Model deployment experience is a plus Experience with version control systems like GitHub and CI/CD tools Experience in Exploratory data Analysis Knowledge of ML Ops / DevOps and deploying ML models is preferred Experience using MLFlow, Kubeflow etc. will be preferred Experience executing and contributing to ML OPS automation infrastructure is good to have Exceptional analytical and problem-solving skills Stakeholder engagement-BU, Vendors. Experience building statistical models in the Retail or Supply chain space is a plus Show more Show less

Posted 1 week ago

Apply

5.0 - 8.0 years

22 - 30 Lacs

Noida, Hyderabad, Bengaluru

Hybrid

Naukri logo

Role: Data Engineer Exp: 5 to 8 Years Location: Bangalore, Noida, and Hyderabad (Hybrid, weekly 2 Days office must) NP: Immediate to 15 Days (Try to find only immediate joiners) Note: Candidate must have experience in Python, Kafka Streams, Pyspark, and Azure Databricks. Not looking for candidates who have only Exp in Pyspark and not in Python. Job Title: SSE Kafka, Python, and Azure Databricks (Healthcare Data Project) Experience: 5 to 8 years Role Overview: We are looking for a highly skilled with expertise in Kafka, Python, and Azure Databricks (preferred) to drive our healthcare data engineering projects. The ideal candidate will have deep experience in real-time data streaming, cloud-based data platforms, and large-scale data processing . This role requires strong technical leadership, problem-solving abilities, and the ability to collaborate with cross-functional teams. Key Responsibilities: Lead the design, development, and implementation of real-time data pipelines using Kafka, Python, and Azure Databricks . Architect scalable data streaming and processing solutions to support healthcare data workflows. Develop, optimize, and maintain ETL/ELT pipelines for structured and unstructured healthcare data. Ensure data integrity, security, and compliance with healthcare regulations (HIPAA, HITRUST, etc.). Collaborate with data engineers, analysts, and business stakeholders to understand requirements and translate them into technical solutions. Troubleshoot and optimize Kafka streaming applications, Python scripts, and Databricks workflows . Mentor junior engineers, conduct code reviews, and ensure best practices in data engineering . Stay updated with the latest cloud technologies, big data frameworks, and industry trends . Required Skills & Qualifications: 4+ years of experience in data engineering, with strong proficiency in Kafka and Python . Expertise in Kafka Streams, Kafka Connect, and Schema Registry for real-time data processing. Experience with Azure Databricks (or willingness to learn and adopt it quickly). Hands-on experience with cloud platforms (Azure preferred, AWS or GCP is a plus) . Proficiency in SQL, NoSQL databases, and data modeling for big data processing. Knowledge of containerization (Docker, Kubernetes) and CI/CD pipelines for data applications. Experience working with healthcare data (EHR, claims, HL7, FHIR, etc.) is a plus. Strong analytical skills, problem-solving mindset, and ability to lead complex data projects. Excellent communication and stakeholder management skills. Email: Sam@hiresquad.in

Posted 1 week ago

Apply

15.0 years

0 Lacs

Mumbai, Maharashtra, India

On-site

Linkedin logo

Introduction A career in IBM Consulting is rooted by long-term relationships and close collaboration with clients across the globe. You'll work with visionaries across multiple industries to improve the hybrid cloud and AI journey for the most innovative and valuable companies in the world. Your ability to accelerate impact and make meaningful change for your clients is enabled by our strategic partner ecosystem and our robust technology platforms across the IBM portfolio; including Software and Red Hat. Curiosity and a constant quest for knowledge serve as the foundation to success in IBM Consulting. In your role, you'll be encouraged to challenge the norm, investigate ideas outside of your role, and come up with creative solutions resulting in ground breaking impact for a wide network of clients. Our culture of evolution and empathy centers on long-term career growth and development opportunities in an environment that embraces your unique skills and experience Your Role And Responsibilities Location : Mumbai Role Overview As a Big Data Engineer, you'll design and build robust data pipelines on Cloudera using Spark (Scala/PySpark) for ingestion, transformation, and processing of high-volume data from banking systems. Key Responsibilities Build scalable batch and real-time ETL pipelines using Spark and Hive Integrate structured and unstructured data sources Perform performance tuning and code optimization Support orchestration and job scheduling (NiFi, Airflow) Preferred Education Master's Degree Required Technical And Professional Expertise Experience: 3–15 years Proficiency in PySpark/Scala with Hive/Impala Experience with data partitioning, bucketing, and optimization Familiarity with Kafka, Iceberg, NiFi is a must Knowledge of banking or financial datasets is a plus Show more Show less

Posted 1 week ago

Apply

2.0 years

0 Lacs

Chennai, Tamil Nadu, India

On-site

Linkedin logo

Job Title: GCP Data Engineer Location: Chennai, India Job type: FTE Mandatory Skills: Google Cloud Platform - Biq Query, Data Flow, Dataproc, Data Fusion, TERRAFORM, Tekton,Cloud SQL, AIRFLOW, POSTGRES, Airflow PySpark, Python, API Job Description 2+Years in GCP Services, Biq Query, Data Flow, Dataproc, DataPlex,DataFusion, Terraform, Tekton, Cloud SQL, Redis Memory, Airflow, Cloud Storage 2+ Years inData Transfer Utilities 2+ Years in Git / any other version control tool 2+ Years in Confluent Kafka 1+ Years of Experience in API Development 2+ Years in Agile Framework 4+ years of strong experience in python, Pyspark development. 4+ years of shell scripting to develop the adhoc jobsfor data importing/exporting Show more Show less

Posted 1 week ago

Apply

3.0 - 7.0 years

22 - 25 Lacs

Bengaluru

Hybrid

Naukri logo

Role & responsibilities 3-6 years of experience in Data Engineering Pipeline Ownership and Quality Assurance, with hands-on expertise in building, testing, and maintaining data pipelines. Proficiency with Azure Data Factory (ADF), Azure Databricks (ADB), and PySpark for data pipeline orchestration and processing large-scale datasets. Strong experience in writing SQL queries and performing data validation, data profiling, and schema checks. Experience with big data validation, including schema enforcement, data integrity checks, and automated anomaly detection. Ability to design, develop, and implement automated test cases to monitor and improve data pipeline efficiency. Deep understanding of Medallion Architecture (Raw, Bronze, Silver, Gold) for structured data flow management. Hands-on experience with Apache Airflow for scheduling, monitoring, and managing workflows. Strong knowledge of Python for developing data quality scripts, test automation, and ETL validations. Familiarity with CI/CD pipelines for deploying and automating data engineering workflows. Solid data governance and data security practices within the Azure ecosystem. Additional Requirements: Ownership of data pipelines ensuring end-to-end execution, monitoring, and troubleshooting failures proactively. Strong stakeholder management skills, including follow-ups with business teams across multiple regions to gather requirements, address issues, and optimize processes. Time flexibility to align with global teams for efficient communication and collaboration. Excellent problem-solving skills with the ability to simulate and test edge cases in data processing environments. Strong communication skills to document and articulate pipeline issues, troubleshooting steps, and solutions effectively. Experience with Unity Catalog or willingness to learn. Preferred candidate profile Immediate Joiner's

Posted 1 week ago

Apply

5.0 - 6.0 years

12 - 22 Lacs

Hyderabad

Hybrid

Naukri logo

Role & responsibilities Completes the delivery of design, code or testing for modules or multiple functions related to IS development initiatives. β€’ Prepares requirement definition, design, technical specifications. β€’ Provides coding, testing and implementation support for identified technical platform (i.e., Mainframe, Mid-range, Distributed or Web) β€’ Analyzes user requirements, and defines technical project scope and assumptions for assigned tasks. β€’ Creates business and/or technical designs for new systems, and/or modifications to existing systems. Must-Have Skills: β€’ 4+ years of development experience with Python/AWS technologies. β€’ Hands on experience with Python, Pyspark, AWS and SQL β€’ AWS services required: S3, Lemda,Dynamo DB and etc.. β€’ Working experience in TDD and BDD frameworks. β€’ Provide technical direction on design considerations, including performance, scalability, availability, maintainability, and auditability. β€’ Strong customer facing experience. β€’ Propose and Design the solution approach to cater business requirements by building/enhancing re-usable components β€’ Working experience SAFe delivery model. β€’ Good organizational and written/verbal communication skills β€’ Good presentation skills β€’ Positive attitude and team focus are required Good-to-Have Skills: β€’ Experience with IBM Spectrum Conductor is an added advantage. β€’ Python utilities like interacting with FLASK API, Sharepoint API is an added advantage.

Posted 1 week ago

Apply

10.0 - 15.0 years

8 - 18 Lacs

Kochi

Remote

Naukri logo

10 yrs of exp working in cloud-native data (Azure Preferred),Databricks, SQL,PySpark, migrating from Hive Metastore to Unity Catalog, Unity Catalog, implementing Row-Level Security (RLS), metadata-driven ETL design patterns,Databricks certifications

Posted 1 week ago

Apply

8.0 - 13.0 years

18 - 33 Lacs

Bengaluru

Hybrid

Naukri logo

Warm Greetings from SP Staffing!! Role: AWS Data Engineer Experience Required :8 to 15 yrs Work Location :Bangalore Required Skills, Technical knowledge of data engineering solutions and practices. Implementation of data pipelines using tools like EMR, AWS Glue, AWS Lambda, AWS Step Functions, API Gateway, Athena Proficient in Python and Spark, with a focus on ETL data processing and data engineering practices. Interested candidates can send resumes to nandhini.spstaffing@gmail.com

Posted 1 week ago

Apply

160.0 years

0 Lacs

Mumbai, Maharashtra, India

On-site

Linkedin logo

About PwC: PricewaterhouseCoopers (PwC) is a leading global consulting firm. For more than 160 years, PwC has worked to build trust in society and solve important problems for clients and the communities in which we live and work. Today we have more than 276,000 people across 157 countries working towards this goal. The US Advisory Bangalore Acceleration Center is a natural extension of our United States based consulting capabilities, providing support to a broad range of practice teams. Our US-owned ACs are fully integrated into our client facing teams and are key to PwC's success in the marketplace. Job Summary: At PwC, we are betting big on data, analytics, and a digital revolution to transform the way deals are done. Analytics is increasingly a major driver of competitive advantages in deal-making, and value creation for private equity owned portfolio companies. PwC brings data-driven insights through advanced techniques to help clients make better strategic decisions, uncover value, and improve returns on their investments. The PwC Deal Analytics & Value Creation practice is a blend of deals and consulting professionals with diverse skills and backgrounds, including financial, commercial, operational, and data science. We support private equity and corporate clients across all phases of the deal lifecycle, including diligence, post-deal, and preparation for exit/divestiture. Our data-driven approach delivers insights in diligence at deal speed, works with clients to improve performance post-deal, and brings a commercial insights lens through third-party and alternative data to help inform decisions. A career in our fast-paced Deal Analytics & Value Creation practice, a business unit within the PwC deals platform, will allow you to work with top private equity and corporate clients across all sectors on complex and dynamic multi-billion-dollar decisions. Each client, deal, and situation is unique, and the ability to translate data into actionable insights for our clients is crucial to our continued success. Job Description As a Senior Associate, you'll work as part of a team of problem solvers, helping solve complex business issues from strategy to execution. PwC Professional skills and responsibilities for this management level include but are not limited to: Use feedback and reflection to develop self-awareness, and personal strengths, and address development areas. Delegate to others to provide stretch opportunities, coaching them to deliver results. Demonstrate critical thinking and the ability to bring order to unstructured problems. Use a broad range of tools and techniques to extract insights from current industry or sector trends. Drive day-to-day deliverables in the team by helping in work planning and review your work and that of others for quality, accuracy, and relevance. Contribute to practice enablement and business development activities Learning new tools and technologies if required. Develop/Implement automation solutions and capabilities that are aligned to client's business requirements Know-how and when to use tools available for a given situation and can explain the reasons for this choice. Use straightforward communication, in a structured way, when influencing and connecting with others. Uphold the firm's code of ethics and business conduct. Preferred Fields Of Study/Experience Dual degree/Master's degree from reputed institutes in Data Science, Data Analytics, Finance, Accounting, Business Administration/Management, Economics, Statistics, Computer and Information Science, Management Information Systems, Engineering, Mathematics A total of 4-7 years of work experience in analytics consulting and/or transaction services with top consulting organizations Experience across the entire Deals Cycle (diligence, post-deal value creation, and exit preparation) Preferred Knowledge/Skills Our team is a blend of deals and consulting professionals with an ability to work with data and teams across our practice to bring targeted commercial and operational insights through industry-specific experience and cutting-edge techniques. We are looking for individuals who demonstrate knowledge and a proven record of success in one or both of the following areas: Business Experience in effectively facilitating day to day stakeholder interactions and relationships based in the US Experience working on high-performing teams preferably in data analytics, consulting, and /or private equity Strong Analytics Consulting experience with demonstrated ability to translate complex data into actionable insights Experience working with business frameworks to analyze markets and assess company position and performance Experience working with alternative data and market data sets to draw insight on competitive positioning and company performance Understanding of financial statements, business cycles (revenue, supply chain, etc.), business diligence, financial modeling, valuation, etc. Experience working in a dynamic, collaborative environment and working under time-sensitive client deadlines Provide insights by understanding the clients' businesses, their industry, and value drivers Strong communication and proven presentation skills Technical High degree of collaboration, ingenuity, and innovation to apply tools and techniques to address client questions Ability to synthesize insights and recommendations into a tight and cohesive presentation to clients Proven track record of data extraction/transformation, analytics, and visualization approaches and a high degree of data fluency Proven skills in the following preferred: Alteryx, Pyspark, Python, Advanced Excel, PowerBI (including visualization and DAX), MS Office Experience working on GenAI / Large language models (LLMs) is a good to have Experience in big data and machine learning concepts Strong track record with leveraging data and business intelligence software to turn data into insights Show more Show less

Posted 1 week ago

Apply

3.0 years

0 Lacs

Pune, Maharashtra, India

On-site

Linkedin logo

Key Responsibilities: Test Strategy & Planning: Develop and implement robust test strategies, detailed test plans, and comprehensive test cases for ETL processes, data migrations, data warehouse solutions, and data lake implementations. Ab Initio ETL Testing: Execute functional, integration, regression, and performance tests for ETL jobs developed using Ab Initio Graphical Development Environment (GDE), Co>Operating System, and plans deployed via Control Center. Validate data transformations, aggregations, and data quality rules implemented within Ab Initio graphs. Spark Data Pipeline Testing: Perform hands-on testing of data pipelines and transformations built using Apache Spark (PySpark/Scala Spark) for large-scale data processing in batch and potentially streaming modes. Verify data correctness, consistency, and performance of Spark jobs from source to target. Advanced Data Validation & Reconciliation: Perform extensive data validation and reconciliation activities between source, staging, and target systems using complex SQL queries. Conduct row counts, sum checks, data type validations, primary key/foreign key integrity checks, and business rule validations. Data Quality Assurance: Identify, analyze, document, and track data quality issues, anomalies, and discrepancies across the data landscape. Collaborate closely with ETL/Spark developers, data architects, and business analysts to understand data quality requirements, identify root causes, and ensure timely resolution of defects. Documentation & Reporting: Create and maintain detailed test documentation, including test cases, test results, defect reports, and data quality metrics dashboards. Provide clear and concise communication on test progress, defect status, and overall data quality posture to stakeholders. Required Skills & Qualifications: Bachelor's degree in Computer Science, Engineering, Information Technology, or a related field. 3+ years of dedicated experience in ETL/Data Warehouse testing. Strong hands-on experience testing ETL processes developed using Ab Initio (GDE, Co>Operating System). Hands-on experience in testing data pipelines built with Apache Spark (PySpark or Scala Spark). Advanced SQL skills for data querying, validation, complex joins, and comparison across heterogeneous databases (e.g., Oracle, DB2, SQL Server, Hive, etc.). Solid understanding of ETL methodologies, data warehousing concepts (Star Schema, Snowflake Schema), and data modeling principles. Experience with test management and defect tracking tools (e.g., JIRA, Azure DevOps, HP ALM). Excellent analytical, problem-solving, and communication skills, with a keen eye for detail. Show more Show less

Posted 1 week ago

Apply

5.0 years

0 Lacs

Pune, Maharashtra, India

On-site

Linkedin logo

Introduction In this role, you'll work in one of our IBM Consulting Client Innovation Centers (Delivery Centers), where we deliver deep technical and industry expertise to a wide range of public and private sector clients around the world. Our delivery centers offer our clients locally based skills and technical expertise to drive innovation and adoption of new technology. Your Role And Responsibilities As a Big Data Engineer, you will develop, maintain, evaluate, and test big data solutions. You will be involved in data engineering activities like creating pipelines/workflows for Source to Target and implementing solutions that tackle the clients needs. Your Primary Responsibilities Include Design, build, optimize and support new and existing data models and ETL processes based on our clients business requirements. Build, deploy and manage data infrastructure that can adequately handle the needs of a rapidly growing data driven organization. Coordinate data access and security to enable data scientists and analysts to easily access to data whenever they need too. Preferred Education Master's Degree Required Technical And Professional Expertise Must have 5+ years exp in Big Data -Hadoop Spark -Scala ,Python Hbase, Hive Good to have Aws -S3, athena ,Dynomo DB, Lambda, Jenkins GIT Developed Python and pyspark programs for data analysis.. Good working experience with python to develop Custom Framework for generating of rules (just like rules engine). Developed Python code to gather the data from HBase and designs the solution to implement using Pyspark. Apache Spark DataFrames/RDD's were used to apply business transformations and utilized Hive Context objects to perform read/write operations. Preferred Technical And Professional Experience Understanding of Devops. Experience in building scalable end-to-end data ingestion and processing solutions Experience with object-oriented and/or functional programming languages, such as Python, Java and Scala Show more Show less

Posted 1 week ago

Apply

5.0 years

0 Lacs

Mumbai Metropolitan Region

On-site

Linkedin logo

Relocation Assistance Offered Within Country Job Number #163961 - Mumbai, Maharashtra, India Who We Are Colgate-Palmolive Company is a global consumer products company operating in over 200 countries specializing in Oral Care, Personal Care, Home Care, Skin Care, and Pet Nutrition. Our products are trusted in more households than any other brand in the world, making us a household name! Join Colgate-Palmolive, a caring, innovative growth company reimagining a healthier future for people, their pets, and our planet. Guided by our core valuesβ€”Caring, Inclusive, and Courageousβ€”we foster a culture that inspires our people to achieve common goals. Together, let's build a brighter, healthier future for all. About Colgate-Palmolive Do you want to come to work with a smile and leave with one as well? In between those smiles, your day consists of working in a global organization, continually learning and collaborating, having stimulating discussions, and making impactful contributions! If this is how you see your career, Colgate is the place to be! Our diligent household brands, dedicated employees, and sustainability commitments make us a company passionate about building a future to smile about for our employees, consumers, and surrounding communities. The pride in our brand fuels a workplace that encourages creative thinking, champions experimentation, and promotes authenticity which has contributed to our enduring success. If you want to work for a company that lives by their values, then give your career a reason to smile...every single day. The Experience In today’s dynamic analytical / technological environment, it is an exciting time to be a part of the GLOBAL ANALYTICS team at Colgate. Our highly insight driven and innovative team is dedicated to driving growth for Colgate Palmolive in this constantly evolving landscape. What role will you play as a member of Colgate's Analytics team? The GLOBAL DATA SCIENCE & ADVANCED ANALYTICS vertical in Colgate Palmolive is focused on working on cases which have big $ impact and scope for scalability. With clear focus on addressing the business questions, with recommended actions The Data Scientist position would lead GLOBAL DATA SCIENCE & ADVANCED ANALYTICS projects within the Analytics Continuum. Conceptualizes and builds predictive modelling, simulations, and optimization solutions for clear $ objectives and measured value The Data Scientist would work on a range of projects ranging across Revenue Growth Management, Market Effectiveness, Forecasting etc. Data Scientist needs to handle relationships independently with Business and to drive projects such as Price Promotion, Marketing Mix and Forecasting Who are you…? You are a function expert - Leads GLOBAL DATA SCIENCE & ADVANCED ANALYTICS within the Analytics Continuum Conceptualizes and builds predictive modelling, simulations, and optimization solutions to address business questions or use cases Applies ML and AI to analytics algorithms to build inferential and predictive models allowing for scalable solutions to be deployed across the business Conducts model validations and continuous improvement of the algorithms, capabilities, or solutions built Deploys models using Airflow, Docker on Google Cloud Platforms Develops end to end business solutions from data extraction, data preparation, data mining to statistical modeling and then building business presentations Own Pricing and Promotion, Marketing Mix, Forecasting study from scoping to delivery Study large amounts of data to discover trends and patterns Mine data through various technologies like BigQuery and SQL Present insights in an easy to interpret way to the business teams Develop visualization (e.g. Looker, PyDash, Flask, PlotLy) using large datasets Ready to work closely with business partners across geographies You connect the dots - Merge multiple data sources and build Statistical Models / Machine Learning models in Price and Promo Elasticity Modelling, Marketing Mix Modelling to derive actionable business insights and recommendation Assemble large, sophisticated data sets that meet functional / non-functional business requirements Build data and visualization tools for Business analytics to assist them in decision making You are a collaborator - Work closely with Division Analytics team leads Work with data and analytics specialists across functions to drive data solutions You are an innovator - Identify, design, and implement new algorithms, process improvements: while continuously automating processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc. Qualifications What you’ll need BE/BTECH [ Computer Science, Information Technology is preferred], MBA or PGDM in Business Analytics / Data Science, Additional DS Certifications or Courses, MSC / MSTAT in Economics or Statistics 5+ years of experience in building data models and driving insights Hands-on/experience on developing statistical models, such as linear regression, ridge regression, lasso, random forest, SVM, gradient boosting, logistic regression, K-Means Clustering, Hierarchical Clustering, Bayesian Regression etc. Hands on experience on coding languages Python(mandatory), R, SQL, PySpark, SparkR Good Understanding of Cloud Frameworks Google Cloud, Snowflake and services like Kubernetes, Cloud Build, Cloud Run. Knowledge of using GitHub, Airflow for coding and model executions and model deployment on cloud platforms Solid understanding on tools like Looker, Domo, Power BI and web apps framework using plotly, pydash, sql Experience front facing Business teams (Client facing role) supporting and working with multi-functional teams in a dynamic environment What You’ll Need…(Preferred) Handling, redefining, and developing statistical models for RGM/Pricing and/or Marketing Effectiveness Experience with third-party data i.e., syndicated market data, Point of Sales, etc. Working knowledge of consumer-packaged goods industry Knowledge of machine learning techniques (clustering, decision tree learning, artificial neural networks, etc.) and their real-world advantages/drawbacks. Experience visualizing/communicating data for partners using: Tableau, DOMO, pydash, plotly, d3.js, ggplot2, pydash, R Shiny etc Willingness and ability to experiment with new tools and techniques Ability to maintain personal composure and thoughtfully handle difficult situations. Knowledge of Google products (Big Query, data studio, colab, Google Slides, Google Sheets etc) Knowledge of deployment of models in Cloud Environment using Airflow, Docker Ability to work with cross functional teams in IT, Data Architecture to build enterprise level Data Science products. Our Commitment to Diversity, Equity & Inclusion Achieving our purpose starts with our people β€” ensuring our workforce represents the people and communities we serve β€”and creating an environment where our people feel they belong; where we can be our authentic selves, feel treated with respect and have the support of leadership to impact the business in a meaningful way. Equal Opportunity Employer Colgate is an equal opportunity employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity, sexual orientation, national origin, ethnicity, age, disability, marital status, veteran status (United States positions), or any other characteristic protected by law. Reasonable accommodation during the application process is available for persons with disabilities. Please complete this request form should you require accommodation. Show more Show less

Posted 1 week ago

Apply

6.0 years

0 Lacs

Kolkata, West Bengal, India

Remote

Linkedin logo

At EY, you’ll have the chance to build a career as unique as you are, with the global scale, support, inclusive culture and technology to become the best version of you. And we’re counting on your unique voice and perspective to help EY become even better, too. Join us and build an exceptional experience for yourself, and a better working world for all. The opportunity We are seeking a highly skilled and motivated Senior DataOps Engineer with strong expertise in the Azure data ecosystem. You will play a crucial role in managing and optimizing data workflows across Azure platforms such as Azure Data Factory, Data Lake, Databricks, and Synapse. Your primary focus will be on building, maintaining, and monitoring data pipelines, ensuring high data quality, and supporting critical data operations. You'll also support visualization, automation, and CI/CD processes to streamline data delivery and reporting. Your Key Responsibilities Data Pipeline Management: Build, monitor, and optimize data pipelines using Azure Data Factory (ADF), Databricks, and Azure Synapse for efficient data ingestion, transformation, and storage. ETL Operations: Design and maintain robust ETL processes for batch and real-time data processing across cloud and on-premise sources. Data Lake Management: Organize and manage structured and unstructured data in Azure Data Lake, ensuring performance and security best practices. Data Quality & Validation: Perform data profiling, validation, and transformation using SQL, PySpark, and Python to ensure data integrity. Monitoring & Troubleshooting: Use logging and monitoring tools to troubleshoot failures in pipelines and address data latency or quality issues. Reporting & Visualization: Work with Power BI or Tableau teams to support dashboard development, ensuring the availability of clean and reliable data. DevOps & CI/CD: Support data deployment pipelines using Azure DevOps, Git, and CI/CD practices for version control and automation. Tool Integration: Collaborate with cross-functional teams to integrate Informatica CDI or similar ETL tools with Azure components for seamless data flow. Collaboration & Documentation: Partner with data analysts, engineers, and business stakeholders, while maintaining SOPs and technical documentation for operational efficiency. Skills and attributes for success Strong hands-on experience in Azure Data Factory, Azure Data Lake, Azure Synapse, and Databricks Solid understanding of ETL/ELT design and implementation principles Strong SQL and PySpark skills for data transformation and validation Exposure to Python for automation and scripting Familiarity with DevOps concepts, CI/CD workflows, and source control systems (Azure DevOps preferred) Experience in working with Power BI or Tableau for data visualization and reporting support Strong problem-solving skills, attention to detail, and commitment to data quality Excellent communication and documentation skills to interface with technical and business teamsStrong knowledge of asset management business operations, especially in data domains like securities, holdings, benchmarks, and pricing. To qualify for the role, you must have 4–6 years of experience in DataOps or Data Engineering roles Proven expertise in managing and troubleshooting data workflows within the Azure ecosystem Experience working with Informatica CDI or similar data integration tools Scripting and automation experience in Python/PySpark Ability to support data pipelines in a rotational on-call or production support environment Comfortable working in a remote/hybrid and cross-functional team setup Technologies and Tools Must haves Azure Databricks: ο‚·Experience in data transformation and processing using notebooks and Spark. Azure Data Lake: Experience working with hierarchical data storage in Data Lake. Azure Synapse: Familiarity with distributed data querying and data warehousing. Azure Data factory: Hands-on experience in orchestrating and monitoring data pipelines. ETL Process Understanding: Knowledge of data extraction, transformation, and loading workflows, including data cleansing, mapping, and integration techniques. Good to have Power BI or Tableau for reporting support Monitoring/logging using Azure Monitor or Log Analytics Azure DevOps and Git for CI/CD and version control Python and/or PySpark for scripting and data handling Informatica Cloud Data Integration (CDI) or similar ETL tools Shell scripting or command-line data SQL (across distributed and relational databases) What We Look For Enthusiastic learners with a passion for data op’s and practices. Problem solvers with a proactive approach to troubleshooting and optimization. Team players who can collaborate effectively in a remote or hybrid work environment. Detail-oriented professionals with strong documentation skills. What we offer EY Global Delivery Services (GDS) is a dynamic and truly global delivery network. We work across six locations – Argentina, China, India, the Philippines, Poland and the UK – and with teams from all EY service lines, geographies and sectors, playing a vital role in the delivery of the EY growth strategy. From accountants to coders to advisory consultants, we offer a wide variety of fulfilling career opportunities that span all business disciplines. In GDS, you will collaborate with EY teams on exciting projects and work with well-known brands from across the globe. We’ll introduce you to an ever-expanding ecosystem of people, learning, skills and insights that will stay with you throughout your career. Continuous learning: You’ll develop the mindset and skills to navigate whatever comes next. Success as defined by you: We’ll provide the tools and flexibility, so you can make a meaningful impact, your way. Transformative leadership: We’ll give you the insights, coaching and confidence to be the leader the world needs. Diverse and inclusive culture: You’ll be embraced for who you are and empowered to use your voice to help others find theirs. EY | Building a better working world EY exists to build a better working world, helping to create long-term value for clients, people and society and build trust in the capital markets. Enabled by data and technology, diverse EY teams in over 150 countries provide trust through assurance and help clients grow, transform and operate. Working across assurance, consulting, law, strategy, tax and transactions, EY teams ask better questions to find new answers for the complex issues facing our world today. Show more Show less

Posted 1 week ago

Apply

5.0 - 10.0 years

7 - 17 Lacs

Bengaluru

Work from Office

Naukri logo

Role & responsibilities Experience: 5 - 8 years Employment Type: Full-Time Job Summary: We are looking for a highly skilled Scala and Spark Developer to join our data engineering team. The ideal candidate will have strong experience in building scalable data processing solutions using Apache Spark and writing robust, high-performance applications in Scala. You will work closely with data scientists, data analysts, and product teams to design, develop, and optimize large-scale data pipelines and ETL workflows. Key Responsibilities: Develop and maintain scalable data processing pipelines using Apache Spark and Scala. Work on batch and real-time data processing using Spark (RDD/DataFrame/Dataset). Write efficient and maintainable code following best practices and coding standards. Collaborate with cross-functional teams to understand data requirements and implement solutions. Optimize performance of Spark jobs and troubleshoot data-related issues. Integrate data from multiple sources and ensure data quality and consistency. Participate in design reviews, code reviews, and provide technical leadership when needed. Contribute to data modeling, schema design, and architecture discussions. Required Skills: Strong programming skills in Scala . Expertise in Apache Spark (Core, SQL, Streaming). Hands-on experience with distributed computing and large-scale data processing. Experience with data formats like Parquet, Avro, ORC, and JSON. Good understanding of functional programming concepts. Familiarity with data ingestion tools (Kafka, Flume, Sqoop, etc.). Experience working with Hadoop ecosystem (HDFS, Hive, YARN, etc.) is a plus. Strong SQL skills and experience working with relational and NoSQL databases. Experience with version control tools like Git. Preferred Qualifications: Bachelor's or Masters degree in Computer Science, Engineering, or related field. Experience with cloud platforms like AWS, Azure, or GCP (especially EMR, Databricks, etc.). Knowledge of containerization (Docker, Kubernetes) is a plus. Familiarity with CI/CD tools and DevOps practices. ndidate profile

Posted 1 week ago

Apply

2.0 years

0 Lacs

Gurugram, Haryana, India

On-site

Linkedin logo

Job Description Circle K (Part of Alimentation Couche-Tard group) is a global leader in the convenience store and fuel space, it has a footprint across 31 countries and territories. At the Circle K Business Centre in India, we are #OneTeam using the power of data analytics to drive our decisions and strengthen Circle K’s global capabilities. We make it easy for our customers all over the world – we partner with the business to empower the right decisions and deliver effectively, while rapidly unlocking value for our customers across the enterprise. Our team in India is an integral part of our talent ecosystem that helps advance us on our journey to becoming a data-centric company. The future of data analytics at Circle K is bright – and we’re only just getting started. About The Role The India Data & Analytics Global Capability Centre is an integral part of ACT’s Global Data & Analytics Team, and the Data Analyst will be a key player on this team that will help grow analytics globally at ACT. The hired candidate will partner with multiple departments, including Global Marketing, Merchandising, Global Technology, and Business Units. The incumbent will be responsible for deploying analytics algorithms and tools on chosen tech stack for efficient and effective delivery. Responsibilities include delivering insights and targeted action plans, address specific areas of risk and opportunity, work cross-functionally with business and technology teams, and leverage the support of global teams for analysis and data. Roles & Responsibilities Analytics (Data & Insights) Clean and organize large datasets for analysis and visualization using statistical methods; verify and ensure accuracy, integrity, and consistency of data Identifying trends and patterns in data and using this information to drive business decisions Create the requirement artefacts e.g., Functional specification document, use cases, requirement traceability matrix, business test cases and process mapping documents, user stories for analytics projects Build highly impactful and intuitive dashboards that bring the underlying data to life through insights Generate ad-hoc analysis for leadership to deliver relevant, action-oriented, and innovative recommendations Operational Excellence Improve data quality by using and improving tools to automatically detect issues Develop analytical solutions or dashboards using user-centric design techniques in alignment with ACT’s protocol Study industry/organization benchmarks and design/develop analytical solutions to monitor or improve business performance across retail, marketing, and other business areas Stakeholder Management Work with high-performing Functional Consultants, Data Engineers, and cross-functional teams to lead / support the complete lifecycle of visual analytical applications, from development of mock-ups and storyboards to complete production ready application Provide regular updates to stakeholders to simplify and clarify complex concepts, and communicate the output of work to business Create compelling documentation or artefacts that connects business to the solutions Coordinate internally to share key learning with other teams and lead to accelerated business performance Behavioral Skills Delivery Excellence Business disposition Social intelligence Innovation and agility Knowledge Functional Analytics (Retail Analytics, Supply Chain Analytics, Marketing Analytics, Customer Analytics, etc.) Working understanding of Statistical modelling using Analytical tools (Python, PySpark, R, etc.) Enterprise reporting systems, relational (MySQL, Microsoft SQL Server, etc.), and non-relational (MongoDB, DynamoDB) database management systems Business intelligence & reporting (Power BI, Tableau, Alteryx, etc.) Cloud computing services in Azure/AWS/GCP for analytics Education Bachelor’s degree in computer science, Information Management or related technical fields Experience 2 + years for Data Analyst Relevant working experience in a quantitative/applied analytics role Experience with programming and the ability to quickly pick up handling large data volumes with modern data processing tools, e.g. by using Spark / SQL / Python Show more Show less

Posted 1 week ago

Apply

6.0 - 9.0 years

0 Lacs

Gurugram, Haryana, India

On-site

Linkedin logo

Summary Position Summary Strategy & Analytics AI & Data In this age of disruption, organizations need to navigate the future with confidence, embracing decision making with clear, data-driven choices that deliver enterprise value in a dynamic business environment. The AI & Data team leverages the power of data, analytics, robotics, science and cognitive technologies to uncover hidden relationships from vast troves of data, generate insights, and inform decision-making. Together with the Strategy practice, our Strategy & Analytics portfolio helps clients transform their business by architecting organizational intelligence programs and differentiated strategies to win in their chosen markets. AI & Data will work with our clients to: Implement large-scale data ecosystems including data management, governance and the integration of structured and unstructured data to generate insights leveraging cloud-based platforms Leverage automation, cognitive and science-based techniques to manage data, predict scenarios and prescribe actions Drive operational efficiency by maintaining their data ecosystems, sourcing analytics expertise and providing As-a-Service offerings for continuous insights and improvements PySpark Sr. Consultant The position is suited for individuals who have demonstrated ability to work effectively in a fast paced, high volume, deadline driven environment. Education And Experience Education: B.Tech/M.Tech/MCA/MS 6-9 years of experience in design and implementation of migrating an Enterprise legacy system to Big Data Ecosystem for Data Warehousing project. Required Skills Must have excellent knowledge in Apache Spark and Python programming experience Deep technical understanding of distributed computing and broader awareness of different Spark version Strong UNIX operating system concepts and shell scripting knowledge Hands-on experience using Spark & Python Deep experience in developing data processing tasks using PySpark such as reading data from external sources, merge data, perform data enrichment and load in to target data destinations. Experience in deployment and operationalizing the code, knowledge of scheduling tools like Airflow, Control-M etc. is preferred Working experience on AWS ecosystem, Google Cloud, BigQuery etc. is an added advantage Hands on experience with AWS S3 Filesystem operations Good knowledge of Hadoop, Hive and Cloudera/ Hortonworks Data Platform Should have exposure with Jenkins or equivalent CICD tool & Git repository Experience handling CDC operations for huge volume of data Should understand and have operating experience with Agile delivery model Should have experience in Spark related performance tuning Should be well versed with understanding of design documents like HLD, TDD etc Should be well versed with Data historical load and overall Framework concepts Should have participated in different kinds of testing like Unit Testing, System Testing, User Acceptance Testing, etc Preferred Skills Exposure to PySpark, Cloudera/ Hortonworks, Hadoop and Hive. Exposure to AWS S3/EC2 and Apache Airflow Participation in client interactions/meetings is desirable. Participation in code-tuning is desirable. Recruiting tips From developing a stand out resume to putting your best foot forward in the interview, we want you to feel prepared and confident as you explore opportunities at Deloitte. Check out recruiting tips from Deloitte recruiters. Benefits At Deloitte, we know that great people make a great organization. We value our people and offer employees a broad range of benefits. Learn more about what working at Deloitte can mean for you. Our people and culture Our inclusive culture empowers our people to be who they are, contribute their unique perspectives, and make a difference individually and collectively. It enables us to leverage different ideas and perspectives, and bring more creativity and innovation to help solve our clients' most complex challenges. This makes Deloitte one of the most rewarding places to work. Our purpose Deloitte’s purpose is to make an impact that matters for our people, clients, and communities. At Deloitte, purpose is synonymous with how we work every day. It defines who we are. Our purpose comes through in our work with clients that enables impact and value in their organizations, as well as through our own investments, commitments, and actions across areas that help drive positive outcomes for our communities. Professional development From entry-level employees to senior leaders, we believe there’s always room to learn. We offer opportunities to build new skills, take on leadership opportunities and connect and grow through mentorship. From on-the-job learning experiences to formal development programs, our professionals have a variety of opportunities to continue to grow throughout their career. Requisition code: 300041 Show more Show less

Posted 1 week ago

Apply

Exploring PySpark Jobs in India

PySpark, a powerful data processing framework built on top of Apache Spark and Python, is in high demand in the job market in India. With the increasing need for big data processing and analysis, companies are actively seeking professionals with PySpark skills to join their teams. If you are a job seeker looking to excel in the field of big data and analytics, exploring PySpark jobs in India could be a great career move.

Top Hiring Locations in India

Here are 5 major cities in India where companies are actively hiring for PySpark roles: 1. Bangalore 2. Pune 3. Hyderabad 4. Mumbai 5. Delhi

Average Salary Range

The estimated salary range for PySpark professionals in India varies based on experience levels. Entry-level positions can expect to earn around INR 6-8 lakhs per annum, while experienced professionals can earn upwards of INR 15 lakhs per annum.

Career Path

In the field of PySpark, a typical career progression may look like this: 1. Junior Developer 2. Data Engineer 3. Senior Developer 4. Tech Lead 5. Data Architect

Related Skills

In addition to PySpark, professionals in this field are often expected to have or develop skills in: - Python programming - Apache Spark - Big data technologies (Hadoop, Hive, etc.) - SQL - Data visualization tools (Tableau, Power BI)

Interview Questions

Here are 25 interview questions you may encounter when applying for PySpark roles:

  • Explain what PySpark is and its main features (basic)
  • What are the advantages of using PySpark over other big data processing frameworks? (medium)
  • How do you handle missing or null values in PySpark? (medium)
  • What is RDD in PySpark? (basic)
  • What is a DataFrame in PySpark and how is it different from an RDD? (medium)
  • How can you optimize performance in PySpark jobs? (advanced)
  • Explain the difference between map and flatMap transformations in PySpark (basic)
  • What is the role of a SparkContext in PySpark? (basic)
  • How do you handle schema inference in PySpark? (medium)
  • What is a SparkSession in PySpark? (basic)
  • How do you join DataFrames in PySpark? (medium)
  • Explain the concept of partitioning in PySpark (medium)
  • What is a UDF in PySpark? (medium)
  • How do you cache DataFrames in PySpark for optimization? (medium)
  • Explain the concept of lazy evaluation in PySpark (medium)
  • How do you handle skewed data in PySpark? (advanced)
  • What is checkpointing in PySpark and how does it help in fault tolerance? (advanced)
  • How do you tune the performance of a PySpark application? (advanced)
  • Explain the use of Accumulators in PySpark (advanced)
  • How do you handle broadcast variables in PySpark? (advanced)
  • What are the different data sources supported by PySpark? (medium)
  • How can you run PySpark on a cluster? (medium)
  • What is the purpose of the PySpark MLlib library? (medium)
  • How do you handle serialization and deserialization in PySpark? (advanced)
  • What are the best practices for deploying PySpark applications in production? (advanced)

Closing Remark

As you explore PySpark jobs in India, remember to prepare thoroughly for interviews and showcase your expertise confidently. With the right skills and knowledge, you can excel in this field and advance your career in the world of big data and analytics. Good luck!

cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies