Home
Jobs

3895 Pyspark Jobs - Page 35

Filter Interviews
Min: 0 years
Max: 25 years
Min: ₹0
Max: ₹10000000
Setup a job Alert
Filter
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

5.0 - 9.0 years

12 - 22 Lacs

Hyderabad, Pune, Bengaluru

Hybrid

Naukri logo

Job description We are looking for Azure Data Engineer's resources having minimum 5 to 9 years of Experience. Role & responsibilities Blend of technical expertise with 5 to 9 year of experience, analytical problem-solving, and collaboration with cross-functional teams. Design and implement Azure data engineering solutions (Ingestion & Curation) Create and maintain Azure data solutions including Azure SQL Database, Azure Data Lake, and Azure Blob Storage. Design, implement, and maintain data pipelines for data ingestion, processing, and transformation in Azure. Utilizing Azure Data Factory or comparable technologies, create and maintain ETL (Extract, Transform, Load) operations Use Azure Data Factory and Databricks to assemble large, complex data sets Implementing data validation and cleansing procedures will ensure the quality, integrity, and dependability of the data. Ensure data quality / security and compliance. Optimize Azure SQL databases for efficient query performance. Collaborate with data engineers, and other stakeholders to understand requirements and translate them into scalable and reliable data platform architectures.

Posted 1 week ago

Apply

5.0 - 9.0 years

12 - 22 Lacs

Hyderabad, Pune, Bengaluru

Hybrid

Naukri logo

Significant 5 to 9 years of experience in designing and implementing scalable data engineering solutions on AWS. Strong proficiency in Python programming language. Expertise in serverless architecture and AWS services such as Lambda, Glue, Redshift, Kinesis, SNS, SQS, and CloudFormation. Experience with Infrastructure as Code (IaC) using AWS CDK for defining and provisioning AWS resources. Proven leadership skills with the ability to mentor and guide junior team members. Excellent understanding of data modeling concepts and experience with tools like ERStudio. Strong communication and collaboration skills, with the ability to work effectively in a cross-functional team environment. Experience with Apache Airflow for orchestrating data pipelines is a plus. Knowledge of Data Lakehouse, dbt, or Apache Hudi data format is a plus. Roles and Responsibilities Design, develop, test, deploy, and maintain large-scale data pipelines using AWS services such as S3, Glue, Lambda, Redshift. Collaborate with cross-functional teams to gather requirements and design solutions that meet business needs. Desired Candidate Profile 5-9 years of experience in an IT industry setting with expertise in Python programming language (Pyspark). Strong understanding of AWS ecosystem including S3, Glue, Lambda, Redshift. Bachelor's degree in Any Specialization (B.Tech/B.E.).

Posted 1 week ago

Apply

4.0 years

0 Lacs

Kochi, Kerala, India

On-site

Linkedin logo

Introduction In this role, you'll work in one of our IBM Consulting Client Innovation Centers (Delivery Centers), where we deliver deep technical and industry expertise to a wide range of public and private sector clients around the world. Our delivery centers offer our clients locally based skills and technical expertise to drive innovation and adoption of new technology. Your Role And Responsibilities As Data Engineer, you will develop, maintain, evaluate and test big data solutions. You will be involved in the development of data solutions using Spark Framework with Python or Scala on Hadoop and AWS Cloud Data Platform Responsibilities Experienced in building data pipelines to Ingest, process, and transform data from files, streams and databases. Process the data with Spark, Python, PySpark, Scala, and Hive, Hbase or other NoSQL databases on Cloud Data Platforms (AWS) or HDFS Experienced in develop efficient software code for multiple use cases leveraging Spark Framework / using Python or Scala and Big Data technologies for various use cases built on the platform Experience in developing streaming pipelines Experience to work with Hadoop / AWS eco system components to implement scalable solutions to meet the ever-increasing data volumes, using big data/cloud technologies Apache Spark, Kafka, any Cloud computing etc Preferred Education Master's Degree Required Technical And Professional Expertise Minimum 4+ years of experience in Big Data technologies with extensive data engineering experience in Spark / Python or Scala. Minimum 3 years of experience on Cloud Data Platforms on AWS Experience in AWS EMR / AWS Glue / DataBricks, AWS RedShift, DynamoDB Good to excellent SQL skills Exposure to streaming solutions and message brokers like Kafka technologies Preferred Technical And Professional Experience Certification in AWS and Data Bricks or Cloudera Spark Certified developers Show more Show less

Posted 1 week ago

Apply

10.0 - 18.0 years

30 - 45 Lacs

Hyderabad

Remote

Naukri logo

About Syren Cloud Syren Cloud Technologies is a cutting-edge company specializing in supply chain solutions and data engineering. Their intelligent insights, powered by technologies like AI and NLP, empower organizations with real-time visibility and proactive decision-making. From control towers to agile inventory management, Syren unlocks unparalleled success in supply chain management. Role Summary An Azure Data Architect is responsible for designing, implementing, and maintaining the data infrastructure within an organization. They collaborate with both business and IT teams to understand stakeholders needs and unlock the full potential of data. They create conceptual and logical data models, analyze structural requirements, and ensure efficient database solutions. Job Responsibilities • Act as subject matter expert providing best-practice guidance on data lake and ETL architecture frameworks suitable for handling big data for unstructured and structured information • Drive business and Service Layer development with the customer by finding new opportunities based on expanding existing solutions creating new ones • Provide hands-on subject matter expertise to build and implement Azure-based Big Data solution • Research, evaluate, architect, and deploy new tools, frameworks, and patterns to build sustainable Big Data platforms for our clients • Facilitate and/or conduct requirements workshops • Responsible for collaborating on the prioritization of technical requirement • Collaborates with peer teams and vendors on the solution and delivery • Has overall accountability for project delivery • Works collaboratively with the Product Management, Data Management, and other Architects to deliver for the cloud data platform, Data as a service • Consults with clients to assess current problem states, define desired future states, define solution architecture and make solutions recommendations

Posted 1 week ago

Apply

4.0 - 7.0 years

13 - 17 Lacs

Pune

Work from Office

Naukri logo

Overview The Data Technology team at MSCI is responsible for meeting the data requirements across various business areas, including Index, Analytics, and Sustainability. Our team collates data from multiple sources such as vendors (e.g., Bloomberg, Reuters), website acquisitions, and web scraping (e.g., financial news sites, company websites, exchange websites, filings). This data can be in structured or semi-structured formats. We normalize the data, perform quality checks, assign internal identifiers, and release it to downstream applications. Responsibilities As data engineers, we build scalable systems to process data in various formats and volumes, ranging from megabytes to terabytes. Our systems perform quality checks, match data across various sources, and release it in multiple formats. We leverage the latest technologies, sources, and tools to process the data. Some of the exciting technologies we work with include Snowflake, Databricks, and Apache Spark. Qualifications Core Java, Spring Boot, Apache Spark, Spring Batch, Python. Exposure to sql databases like Oracle, Mysql, Microsoft Sql is a must. Any experience/knowledge/certification on Cloud technology preferrably Microsoft Azure or Google cloud platform is good to have. Exposures to non sql databases like Neo4j or Document database is again good to have. What we offer you Transparent compensation schemes and comprehensive employee benefits, tailored to your location, ensuring your financial security, health, and overall wellbeing. Flexible working arrangements, advanced technology, and collaborative workspaces. A culture of high performance and innovation where we experiment with new ideas and take responsibility for achieving results. A global network of talented colleagues, who inspire, support, and share their expertise to innovate and deliver for our clients. Global Orientation program to kickstart your journey, followed by access to our Learning@MSCI platform, LinkedIn Learning Pro and tailored learning opportunities for ongoing skills development. Multi-directional career paths that offer professional growth and development through new challenges, internal mobility and expanded roles. We actively nurture an environment that builds a sense of inclusion belonging and connection, including eight Employee Resource Groups. All Abilities, Asian Support Network, Black Leadership Network, Climate Action Network, Hola! MSCI, Pride & Allies, Women in Tech, and Women’s Leadership Forum. At MSCI we are passionate about what we do, and we are inspired by our purpose – to power better investment decisions. You’ll be part of an industry-leading network of creative, curious, and entrepreneurial pioneers. This is a space where you can challenge yourself, set new standards and perform beyond expectations for yourself, our clients, and our industry. MSCI is a leading provider of critical decision support tools and services for the global investment community. With over 50 years of expertise in research, data, and technology, we power better investment decisions by enabling clients to understand and analyze key drivers of risk and return and confidently build more effective portfolios. We create industry-leading research-enhanced solutions that clients use to gain insight into and improve transparency across the investment process. MSCI Inc. is an equal opportunity employer. It is the policy of the firm to ensure equal employment opportunity without discrimination or harassment on the basis of race, color, religion, creed, age, sex, gender, gender identity, sexual orientation, national origin, citizenship, disability, marital and civil partnership/union status, pregnancy (including unlawful discrimination on the basis of a legally protected parental leave), veteran status, or any other characteristic protected by law. MSCI is also committed to working with and providing reasonable accommodations to individuals with disabilities. If you are an individual with a disability and would like to request a reasonable accommodation for any part of the application process, please email Disability.Assistance@msci.com and indicate the specifics of the assistance needed. Please note, this e-mail is intended only for individuals who are requesting a reasonable workplace accommodation; it is not intended for other inquiries. To all recruitment agencies MSCI does not accept unsolicited CVs/Resumes. Please do not forward CVs/Resumes to any MSCI employee, location, or website. MSCI is not responsible for any fees related to unsolicited CVs/Resumes. Note on recruitment scams We are aware of recruitment scams where fraudsters impersonating MSCI personnel may try and elicit personal information from job seekers. Read our full note on careers.msci.com

Posted 1 week ago

Apply

5.0 - 10.0 years

5 - 9 Lacs

Bengaluru

Work from Office

Naukri logo

PositionSenior Data Engineer - Airflow, PLSQL Experience5+ Years LocationBangalore/Hyderabad/Pune Seeking a Senior Data Engineer with strong expertise in Apache Airflow and Oracle PL/SQL, along with working experience in Snowflake and Agile methodologies. The ideal candidate will also take up Scrum Master responsibilities and lead a data engineering scrum team to deliver robust, scalable data solutions. Key Responsibilities: Design, develop, and maintain scalable data pipelines using Apache Airflow. Write and optimize complex PL/SQL queries, procedures, and packages on Oracle databases. Collaborate with cross-functional teams to design efficient data models and integration workflows. Work with Snowflake for data warehousing and analytics use cases. Own the delivery of sprint goals, backlog grooming, and facilitation of agile ceremonies as the Scrum Master. Monitor pipeline health and troubleshoot production data issues proactively. Ensure code quality, documentation, and best practices across the team. Mentor junior data engineers and promote a culture of continuous improvement. Required Skills and Qualifications: 5+ years of experience as a Data Engineer in enterprise environments. Strong expertise in Apache Airflow for orchestrating workflows. Expert in Oracle PL/SQL - stored procedures, performance tuning, debugging. Hands-on experience with Snowflake - data modeling, SQL, optimization. Working knowledge of version control (Git) and CI/CD practices. Prior experience or certification as a Scrum Master is highly desirable. Strong analytical and problem-solving skills with attention to detail. Excellent communication and leadership skills.

Posted 1 week ago

Apply

2.0 - 4.0 years

8 - 12 Lacs

Mumbai

Work from Office

Naukri logo

The SAS to Databricks Migration Developer will be responsible for migrating existing SAS code, data processes, and workflows to the Databricks platform This role requires expertise in both SAS and Databricks, with a focus on converting SAS logic into scalable PySpark and Python code The developer will design, implement, and optimize data pipelines, ensuring seamless integration and functionality within the Databricks environment Collaboration with various teams is essential to understand data requirements and deliver solutions that meet business needs

Posted 1 week ago

Apply

5.0 - 8.0 years

15 - 27 Lacs

Bengaluru

Work from Office

Naukri logo

Strong experience with Python, SQL, pySpark, AWS Glue. Good to have - Shell Scripting, Kafka Good knowledge of DevOps pipeline usage (Jenkins, Bitbucket, EKS, Lightspeed) Experience of AWS tools (AWS S3, EC2, Athena, Redshift, Glue, EMR, Lambda, RDS, Kinesis, DynamoDB, QuickSight etc.). Orchestration using Airflow Good to have - Streaming technologies and processing engines, Kinesis, Kafka, Pub/Sub and Spark Streaming Good debugging skills Should have strong hands-on design and engineering background in AWS, across a wide range of AWS services with the ability to demonstrate working on large engagements. Strong experience and implementation of Data lakes, Data warehousing, Data Lakehouse architectures. Ensure data accuracy, integrity, privacy, security, and compliance through quality control procedures. Monitor data systems performance and implement optimization strategies. Leverage data controls to maintain data privacy, security, compliance, and quality for allocated areas of ownership. Demonstrable knowledge of applying Data Engineering best practices (coding practices to DS, unit testing, version control, code review). Experience in Insurance domain preferred.

Posted 1 week ago

Apply

3.0 - 6.0 years

0 - 3 Lacs

Navi Mumbai, Mumbai (All Areas)

Work from Office

Naukri logo

Job Title: Data Scientists Location: Navi Mumbai Duration: Fulltime Positions: Multiple We are looking for a highly skilled Data Scientists with deep expertise in time series forecasting, particularly in demand forecasting and customer lifecycle analytics (CLV). The ideal candidate will be proficient in Python or PySpark, have hands-on experience with tools like Prophet and ARIMA, and be comfortable working in Databricks environments. Familiarity with classic ML models and optimization techniques is a plus. Key Responsibilities • Develop, deploy, and maintain time series forecasting models (Prophet, ARIMA, etc.) for demand forecasting and customer behavior modeling. • Design and implement Customer Lifetime Value (CLV) models to drive customer retention and engagement strategies. • Process and analyze large datasets using PySpark or Python (Pandas). • Partner with cross-functional teams to identify business needs and translate them into data science solutions. • Leverage classic ML techniques (classification, regression) and boosting algorithms (e.g., XGBoost, LightGBM) to support broader analytics use cases. • Use Databricks for collaborative development, data pipelines, and model orchestration. • Apply optimization techniques where relevant to improve forecast accuracy and business decision-making. • Present actionable insights and communicate model results effectively to technical and non-technical stakeholders. Required Qualifications • Strong experience in Time Series Forecasting, with hands-on knowledge of Prophet, ARIMA, or equivalent Mandatory. • Proven track record in Demand Forecasting Highly Preferred. • Experience in modeling Customer Lifecycle Value (CLV) or similar customer analytics use cases – Highly Preferred. • Proficiency in Python (Pandas) or PySpark – Mandatory. • Experience with Databricks – Mandatory. • Solid foundation in statistics, predictive modeling, and machine learning

Posted 1 week ago

Apply

12.0 - 17.0 years

10 - 14 Lacs

Noida

Work from Office

Naukri logo

Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by diversity and inclusion, talented peers, comprehensive benefits and career development opportunities. Come make an impact on the communities we serve as you help us advance health equity on a global scale. Join us to start Caring. Connecting. Growing together. At Optum AI, we leverage data and resources to make a significant impact on the healthcare system. Our solutions have the potential to improve healthcare for everyone. We work on cutting-edge projects involving ML, NLP, and LLM techniques, continuously developing and improving generative AI methods for structured and unstructured healthcare data. Our team collaborates with world-class experts and top universities to develop innovative AI/ML solutions, often leading to patents and published papers. Primary Responsibilities Develop and implement AI and machine learning strategies for several healthcare domains Collaborate with cross-functional teams to identify and prioritize AI and machine learning initiatives Manage the development and deployment of AI and machine learning solutions Develop and run pipelines for data ingress and model output egress Develop and run scripts for ML model inference Design, implement, and maintain CI/CD pipelines for MLOps and DevOps functions Identify technical problems and develop software updates and fixes Develop scripts or tools to automate repetitive tasks Automate the provisioning and configuration of infrastructure resources Provide guidance on the best use of specific tools or technologies to achieve desired results Create documentation for infrastructure design and deployment procedures Utilize AI/ML frameworks and tools such as MLFlow, TensorFlow, PyTorch, Keras, Scikit-learn, etc. Lead and manage AI/ML teams and projects from ideation to delivery and evaluation Apply expertise in various AI/ML techniques, including deep learning, NLP, computer vision, recommender systems, reinforcement learning, and large language models Communicate complex AI/ML concepts and results to technical and non-technical audiences effectively Comply with the terms and conditions of the employment contract, company policies and procedures, and any and all directives (such as, but not limited to, transfer and/or re-assignment to different work locations, change in teams and/or work shifts, policies in regards to flexibility of work benefits and/or work environment, alternative work arrangements, and other decisions that may arise due to the changing business environment). The Company may adopt, vary or rescind these policies and directives in its absolute discretion and without any limitation (implied or otherwise) on its ability to do so Required Qualifications: Bachelors/master degree in computer science, engineering, mathematics, statistics, or a related discipline 12+ years of experience in Software Engineering, Data Science, or Analytics with 8+ years of experience in AI/ML engineering or related fields Experience with cloud platforms and services, such as AWS, Azure, GCP, etc. Experience in developing solutions in the NLP space and relevant projects Hands on Experience in AI and drive the development of innovative AI and machine learning solutions Demonstrated experience in leading and managing AI/ML teams and projects, from ideation to delivery and evaluation Experience with Azure development environments Knowledge of NLP literature, thrust areas, conference venues, and code repositories Familiarity with both open-source and OpenAI LLMs and RAG architecture Familiarity with UI tools like Streamlit, Flask, FAST APIs, Rest APIs, Docker containers Understanding of common NLP tasks such as text classification, entity recognition, entity extraction, and question answering Proficient in Python and one of PySpark or Scala. Familiarity with python tools for data processing Proficiency in multiple machine learning and AI techniques such as supervised, unsupervised, reinforcement learning, deep learning, and NLP Proficiency in Python, R, or other programming languages for data analysis and AI/ML development Proficiency in libraries such as Hugging Face and OpenAI API Proven ability to develop and deploy data pipelines, machine learning models, or applications on cloud platforms (Azure, Databricks, AzureML) Proven excellent communication, presentation, and interpersonal skills, with the ability to explain complex AI/ML concepts and results to technical and non-technical audiences Proven solid analytical, problem-solving, and decision-making skills, with the ability to balance innovation and pragmatism Proeven passion for learning and staying updated with the latest AI/ML trends and research

Posted 1 week ago

Apply

6.0 - 9.0 years

30 - 45 Lacs

Bengaluru

Work from Office

Naukri logo

Responsibilities: - Contribute and build an internal product library that is focused on solving business problems related to prediction & recommendation. Research unfamiliar methodologies, techniques to fine tune existing models in the product suite and, recommend better solutions and/or technologies. Improve features of the product to include newer machine learning algorithms in the likes of product recommendation, real time predictions, fraud detection, offer personalization etc Collaborate with client teams to on-board data, build models and score predictions. Participate in building automations and standalone applications around machine learning algorithms to enable a One Click solution to getting predictions and recommendations. Analyze large datasets, perform data wrangling operations, apply statistical treatments to filter and fine tune input data, engineer new features and eventually aid the process of building machine learning models. Run test cases to tune existing models for performance, check criteria and define thresholds for success by scaling the input data to multifold. Demonstrate a basic understanding of different machine learning concepts such as Regression, Classification, Matrix Factorization, K-fold Validations and different algorithms such as Decision Trees, Random Forrest, K-means clustering. Demonstrate working knowledge and contribute to building models using deep learning techniques, ensuring robust, scalable and high-performance solutions Minimum Qualifications: Education: Master's or PhD in a quantitative discipline (Statistics, Economics, Mathematics, Computer Science) is highly preferred. Deep Learning Mastery: Extensive experience with deep learning frameworks (TensorFlow, PyTorch, or Keras) and advanced deep learning projects across various domains, with a focus on multimodal data applications. Generative AI Expertise: Proven experience with generative AI models and techniques, such as RAG, VAEs, Transformers, and applications at scale in content creation or data augmentation. Programming and Big Data: Expert-level proficiency in Python and big data/cloud technologies (Databricks and Spark) with a minimum of 4-5 years of experience . Recommender Systems and Real-time Predictions: Expertise in developing sophisticated recommender systems, including the application of real-time prediction frameworks. Machine Learning Algorithms: In-depth experience with complex algorithms such as logistic regression, random forest, XGBoost, advanced neural networks, and ensemble methods. Experienced with machine learning algorithms such as logistic regression, random forest, XG boost, KNN, SVM, neural network, linear regression, lasso regression and k-means. Desirable Qualifications: Generative AI Tools Knowledge: Proficiency with tools and platforms for generative AI (such as OpenAI, Hugging Face Transformers). Databricks and Unity Catalog: Experience leveraging Databricks and Unity Catalog for robust data management, model deployment, and tracking. Working experience in CI/CD tools such as GIT & BitBucket

Posted 1 week ago

Apply

4.0 - 9.0 years

8 - 12 Lacs

Pune

Work from Office

Naukri logo

Amdocs helps those who build the future to make it amazing. With our market-leading portfolio of software products and services, we unlock our customers innovative potential, empowering them to provide next-generation communication and media experiences for both the individual end user and enterprise customers. Our employees around the globe are here to accelerate service providers migration to the cloud, enable them to differentiate in the 5G era, and digitalize and automate their operations. Listed on the NASDAQ Global Select Market, Amdocs had revenue of $5.00 billion in fiscal 2024. For more information, visit www.amdocs.com In one sentence We are seeking a Data Engineer with advanced expertise in Databricks SQL, PySpark, Spark SQL, and workflow orchestration using Airflow. The successful candidate will lead critical projects, including migrating SQL Server Stored Procedures to Databricks Notebooks, designing incremental data pipelines, and orchestrating workflows in Azure Databricks What will your job look like Migrate SQL Server Stored Procedures to Databricks Notebooks, leveraging PySpark and Spark SQL for complex transformations. Design, build, and maintain incremental data load pipelines to handle dynamic updates from various sources, ensuring scalability and efficiency. Develop robust data ingestion pipelines to load data into the Databricks Bronze layer from relational databases, APIs, and file systems. Implement incremental data transformation workflows to update silver and gold layer datasets in near real-time, adhering to Delta Lake best practices. Integrate Airflow with Databricks to orchestrate end-to-end workflows, including dependency management, error handling, and scheduling. Understand business and technical requirements, translating them into scalable Databricks solutions. Optimize Spark jobs and queries for performance, scalability, and cost-efficiency in a distributed environment. Implement robust data quality checks, monitoring solutions, and governance frameworks within Databricks. Collaborate with team members on Databricks best practices, reusable solutions, and incremental loading strategies All you need is... Bachelor s degree in computer science, Information Systems, or a related discipline. 4+ years of hands-on experience with Databricks, including expertise in Databricks SQL, PySpark, and Spark SQL. Proven experience in incremental data loading techniques into Databricks, leveraging Delta Lake's features (e.g., time travel, MERGE INTO). Strong understanding of data warehousing concepts, including data partitioning, and indexing for efficient querying. Proficiency in T-SQL and experience in migrating SQL Server Stored Procedures to Databricks. Solid knowledge of Azure Cloud Services, particularly Azure Databricks and Azure Data Lake Storage. Expertise in Airflow integration for workflow orchestration, including designing and managing DAGs. Familiarity with version control systems (e.g., Git) and CI/CD pipelines for data engineering workflows. Excellent analytical and problem-solving skills with a focus on detail-oriented development. Preferred Qualifications Advanced knowledge of Delta Lake optimizations, such as compaction, Z-ordering, and vacuuming. Experience with real-time streaming data pipelines using tools like Kafka or Azure Event Hubs. Familiarity with advanced Airflow features, such as SLA monitoring and external task dependencies. Certifications such as Databricks Certified Associate Developer for Apache Spark or equivalent. Experience in Agile development methodologie Why you will love this job: You will be able to use your specific insights to lead business change on a large scale and drive transformation within our organization. You will be a key member of a global, dynamic and highly collaborative team with various possibilities for personal and professional development. You will have the opportunity to work in multinational environment for the global market leader in its field! We offer a wide range of stellar benefits including health, dental, vision, and life insurance as well as paid time off, sick time, and parental leave!

Posted 1 week ago

Apply

6.0 - 11.0 years

15 - 25 Lacs

Chennai, Coimbatore, Bengaluru

Work from Office

Naukri logo

Hi Professionals, We are looking for Senior Data Engineer for Permanent Role Work Location: Hybrid Chennai, Coimbatore or Bangalore Experience: 6 to 12 Years Notice Period: 0 TO 15 Days or Immediate Joiner. Skills: 1. Python 2. Pyspark 3. SQL 4. AWS 5. GCP 6. MLOps Interested can send your resume to gowtham.veerasamy@wavicledata.com.

Posted 1 week ago

Apply

5.0 - 10.0 years

10 - 15 Lacs

Chennai, Bengaluru

Work from Office

Naukri logo

job requisition idJR1027452 Overall Responsibilities: Data Pipeline Development: Design, develop, and maintain highly scalable and optimized ETL pipelines using PySpark on the Cloudera Data Platform, ensuring data integrity and accuracy. Data Ingestion: Implement and manage data ingestion processes from a variety of sources (e.g., relational databases, APIs, file systems) to the data lake or data warehouse on CDP. Data Transformation and Processing: Use PySpark to process, cleanse, and transform large datasets into meaningful formats that support analytical needs and business requirements. Performance Optimization: Conduct performance tuning of PySpark code and Cloudera components, optimizing resource utilization and reducing runtime of ETL processes. Data Quality and Validation: Implement data quality checks, monitoring, and validation routines to ensure data accuracy and reliability throughout the pipeline. Automation and Orchestration: Automate data workflows using tools like Apache Oozie, Airflow, or similar orchestration tools within the Cloudera ecosystem. Monitoring and Maintenance: Monitor pipeline performance, troubleshoot issues, and perform routine maintenance on the Cloudera Data Platform and associated data processes. Collaboration: Work closely with other data engineers, analysts, product managers, and other stakeholders to understand data requirements and support various data-driven initiatives. Documentation: Maintain thorough documentation of data engineering processes, code, and pipeline configurations. Software : Advanced proficiency in PySpark, including working with RDDs, DataFrames, and optimization techniques. Strong experience with Cloudera Data Platform (CDP) components, including Cloudera Manager, Hive, Impala, HDFS, and HBase. Knowledge of data warehousing concepts, ETL best practices, and experience with SQL-based tools (e.g., Hive, Impala). Familiarity with Hadoop, Kafka, and other distributed computing tools. Experience with Apache Oozie, Airflow, or similar orchestration frameworks. Strong scripting skills in Linux. Category-wise Technical Skills: PySpark: Advanced proficiency in PySpark, including working with RDDs, DataFrames, and optimization techniques. Cloudera Data Platform: Strong experience with Cloudera Data Platform (CDP) components, including Cloudera Manager, Hive, Impala, HDFS, and HBase. Data Warehousing: Knowledge of data warehousing concepts, ETL best practices, and experience with SQL-based tools (e.g., Hive, Impala). Big Data Technologies: Familiarity with Hadoop, Kafka, and other distributed computing tools. Orchestration and Scheduling: Experience with Apache Oozie, Airflow, or similar orchestration frameworks. Scripting and Automation: Strong scripting skills in Linux. Experience: 5-12 years of experience as a Data Engineer, with a strong focus on PySpark and the Cloudera Data Platform. Proven track record of implementing data engineering best practices. Experience in data ingestion, transformation, and optimization on the Cloudera Data Platform. Day-to-Day Activities: Design, develop, and maintain ETL pipelines using PySpark on CDP. Implement and manage data ingestion processes from various sources. Process, cleanse, and transform large datasets using PySpark. Conduct performance tuning and optimization of ETL processes. Implement data quality checks and validation routines. Automate data workflows using orchestration tools. Monitor pipeline performance and troubleshoot issues. Collaborate with team members to understand data requirements. Maintain documentation of data engineering processes and configurations. Qualifications: Bachelors or Masters degree in Computer Science, Data Engineering, Information Systems, or a related field. Relevant certifications in PySpark and Cloudera technologies are a plus. Soft Skills: Strong analytical and problem-solving skills. Excellent verbal and written communication abilities. Ability to work independently and collaboratively in a team environment. Attention to detail and commitment to data quality.

Posted 1 week ago

Apply

6.0 - 10.0 years

4 - 9 Lacs

Hyderabad, Bengaluru, Mumbai (All Areas)

Work from Office

Naukri logo

1 Role- Java Spark Developer 2 Technical Skill Set- Spark / Java Big Data 3 Experience - 6 to 10 yrs 4 Location- Bengaluru, Mumbai, Hyderabad * Must-Have * Spark programming Java / J2EE. Oracle Database, Microservices, Springboot AWS * Good-to-Have * 1 Experience in writing Spark programming for Bigdata Hadoop. 2 Good and hands-on experienced in Java, , Microservices, Springboot ,AWS and Spark programming. 3 Ability to understand and do shell scripting in Unix. 4 Having Java/J2EE experience is a plus along with working in Agile environment. 5

Posted 1 week ago

Apply

3.0 - 6.0 years

9 - 18 Lacs

Mumbai, Navi Mumbai

Work from Office

Naukri logo

About Us: Celebal Technologies is a leading Solution Service company that provide Services the field of Data Science, Big Data, Enterprise Cloud & Automation. We are at the forefront of leveraging cuttingedge technologies to drive innovation and enhance our business processes. As part of our commitment to staying ahead in the industry, we are seeking a talented and experienced Data & AI Engineer with strong Azure cloud competencies to join our dynamic team. Job Summary: We are looking for a highly skilled Data Scientist with deep expertise in time series forecasting, particularly in demand forecasting and customer lifecycle analytics (CLV). The ideal candidate will be proficient in Python or PySpark, have hands-on experience with tools like Prophet and ARIMA, and be comfortable working in Databricks environments. Familiarity with classic ML models and optimization techniques is a plus. Key Responsibilities • Develop, deploy, and maintain time series forecasting models (Prophet, ARIMA, etc.) for demand forecasting and customer behavior modeling. • Design and implement Customer Lifetime Value (CLV) models to drive customer retention and engagement strategies. • Process and analyze large datasets using PySpark or Python (Pandas). • Partner with cross-functional teams to identify business needs and translate them into data science solutions. • Leverage classic ML techniques (classification, regression) and boosting algorithms (e.g., XGBoost, LightGBM) to support broader analytics use cases. • Use Databricks for collaborative development, data pipelines, and model orchestration. • Apply optimization techniques where relevant to improve forecast accuracy and business decision-making. • Present actionable insights and communicate model results effectively to technical and non-technical stakeholders. Required Qualifications • Strong experience in Time Series Forecasting, with hands-on knowledge of Prophet, ARIMA, or equivalent Mandatory. • Proven track record in Demand Forecasting Highly Preferred. • Experience in modeling Customer Lifecycle Value (CLV) or similar customer analytics use cases Highly Preferred. • Proficiency in Python (Pandas) or PySpark Mandatory. • Experience with Databricks Mandatory. • Solid foundation in statistics, predictive modeling, and machine learning

Posted 1 week ago

Apply

0 years

0 Lacs

Hyderabad, Telangana, India

On-site

Linkedin logo

Hello Everyone, We are looking for Data Engineer with 6 plus years of experience. Skills: Data Engineer with Data Brick, Pyspark and SQL If interested please reach out to sandhyay@nextgenprosinc.com Show more Show less

Posted 1 week ago

Apply

2.0 - 5.0 years

8 - 15 Lacs

Pune

Work from Office

Naukri logo

We're Hiring: Data Engineer | 25 Years Experience | AWS + Real-time Focus Join our fast-moving team as a Data Engineer where you'll build scalable, real-time data pipelines, own cloud infrastructure, and collaborate across teams to drive data-first decisions. If you're strong in Python , experienced with streaming platforms like Kafka/Kinesis, and have shipped cloud-native data pipelines (preferably AWS) — we want to hear from you. Must-Haves: 2–5 years of experience in Data Engineering Python (Pandas, PySpark, async), SQL, ETL/ELT Streaming experience (Kafka/Kinesis) AWS cloud stack (Glue, Lambda, S3, Athena) Experience in APIs, data warehousing, and data modelling Bonus if you know: Docker, Kubernetes, Airflow/dbt, or have a background in MLOps. Role & responsibilities Preferred candidate profile

Posted 1 week ago

Apply

6.0 years

0 Lacs

India

Remote

Linkedin logo

***Immediate requirement*** Job Title: Data Bricks Expert (PySpark and Snowflake Mandatory) No. of years of experience: 6+ years Job Type: Contract Contract Duration: 6-12 months (potential to extend or convert to permanent) Location: India Work Type: Remote Salary Range: 15 - 17 LPA Start Date: Immediate (Notice period/joining within 1-2 weeks) **Apply only if you can join within 1-2 weeks** Job summary: We’re looking for a Databricks expert with strong hands-on experience in PySpark and Snowflake to design and optimize large-scale data pipelines. The ideal candidate should be proficient in developing ETL workflows within Databricks and integrating with Snowflake for data warehousing solutions. Experience with AWS services (like S3, Lambda, or Glue) is a plus. This role requires strong problem-solving skills and a deep understanding of data engineering best practices. Apply now! Show more Show less

Posted 1 week ago

Apply

7.0 years

0 Lacs

Noida, Uttar Pradesh, India

On-site

Linkedin logo

At EY, you’ll have the chance to build a career as unique as you are, with the global scale, support, inclusive culture and technology to become the best version of you. And we’re counting on your unique voice and perspective to help EY become even better, too. Join us and build an exceptional experience for yourself, and a better working world for all. EY-Strategy and Transactions - SaT – DnA Associate Manager EY’s Data n’ Analytics team is a multi-disciplinary technology team delivering client projects and solutions across Data Management, Visualization, Business Analytics and Automation. The assignments cover a wide range of countries and industry sectors. The opportunity We’re looking for Associate Manager - Data Engineering. The main objective of the role is to support cloud and on-prem platform analytics and data engineering projects initiated across engagement teams. The role will primarily involve conceptualizing, designing, developing, deploying and maintaining complex technology solutions which help EY solve business problems for the clients. This role will work closely with technical architects, product and business subject matter experts (SMEs), back-end developers and other solution architects and is also on-shore facing. This role will be instrumental in designing, developing, and evolving the modern data warehousing solutions and data integration build-outs using cutting edge tools and platforms for both on-prem and cloud architectures. In this role you will be coming up with design specifications, documentation, and development of data migration mappings and transformations for a modern Data Warehouse set up/data mart creation and define robust ETL processing to collect and scrub both structured and unstructured data providing self-serve capabilities (OLAP) in order to create impactful decision analytics reporting. Your Key Responsibilities Evaluating and selecting data warehousing tools for business intelligence, data population, data management, metadata management and warehouse administration for both on-prem and cloud based engagements Strong working knowledge across the technology stack including ETL, ELT, data analysis, metadata, data quality, audit and design Design, develop, and test in ETL tool environment (GUI/canvas driven tools to create workflows) Experience in design documentation (data mapping, technical specifications, production support, data dictionaries, test cases, etc.) Provides technical leadership to a team of data warehouse and business intelligence developers Coordinate with other technology users to design and implement matters of data governance, data harvesting, cloud implementation strategy, privacy, and security Adhere to ETL/Data Warehouse development Best Practices Responsible for Data orchestration, ingestion, ETL and reporting architecture for both on-prem and cloud ( MS Azure/AWS/GCP) Assisting the team with performance tuning for ETL and database processes Skills And Attributes For Success Minimum of 7 years of total experience with 3+ years in Data warehousing/ Business Intelligence field Solid hands-on 3+ years of professional experience with creation and implementation of data warehouses on client engagements and helping create enhancements to a data warehouse Strong knowledge of data architecture for staging and reporting schemas ,data models and cutover strategies using industry standard tools and technologies Architecture design and implementation experience with medium to complex on-prem to cloud migrations with any of the major cloud platforms (preferably AWS/Azure/GCP) Minimum 3+ years experience in Azure database offerings [ Relational, NoSQL, Datawarehouse ] 2+ years hands-on experience in various Azure services preferred – Azure Data Factory,Kafka, Azure Data Explorer, Storage, Azure Data Lake, Azure Synapse Analytics ,Azure Analysis Services & Databricks Minimum of 3 years of hands-on database design, modeling and integration experience with relational data sources, such as SQL Server databases ,Oracle/MySQL, Azure SQL and Azure Synapse Strong in PySpark, SparkSQL Knowledge and direct experience using business intelligence reporting tools (Power BI, Alteryx, OBIEE, Business Objects, Cognos, Tableau, MicroStrategy, SSAS Cubes etc.) Strong creative instincts related to data analysis and visualization. Aggressive curiosity to learn the business methodology, data model and user personas. Strong understanding of BI and DWH best practices, analysis, visualization, and latest trends. Experience with the software development lifecycle (SDLC) and principles of product development such as installation, upgrade and namespace management Willingness to mentor team members Solid analytical, technical and problem solving skills Excellent written and verbal communication skills To qualify for the role, you must have Bachelor’s or equivalent degree in computer science, or related field, required. Advanced degree or equivalent business experience preferred Fact driven and analytically minded with excellent attention to details Hands-on experience with data engineering tasks such as building analytical data records and experience manipulating and analyzing large volumes of data Relevant work experience of minimum 6 to 8 years in a big 4 or technology/ consulting set up Ideally, you’ll also have Ability to think strategically/end-to-end with result-oriented mindset Ability to build rapport within the firm and win the trust of the clients Willingness to travel extensively and to work on client sites / practice office locations Experience in Snowflake What We Look For A Team of people with commercial acumen, technical experience and enthusiasm to learn new things in this fast-moving environment An opportunity to be a part of market-leading, multi-disciplinary team of 1400 + professionals, in the only integrated global transaction business worldwide. Opportunities to work with EY SaT practices globally with leading businesses across a range of industries What Working At EY Offers At EY, we’re dedicated to helping our clients, from start–ups to Fortune 500 companies — and the work we do with them is as varied as they are. You get to work with inspiring and meaningful projects. Our focus is education and coaching alongside practical experience to ensure your personal development. We value our employees and you will be able to control your own development with an individual progression plan. You will quickly grow into a responsible role with challenging and stimulating assignments. Moreover, you will be part of an interdisciplinary environment that emphasizes high quality and knowledge exchange. Plus, we offer: Support, coaching and feedback from some of the most engaging colleagues around Opportunities to develop new skills and progress your career The freedom and flexibility to handle your role in a way that’s right for you EY | Building a better working world EY exists to build a better working world, helping to create long-term value for clients, people and society and build trust in the capital markets. Enabled by data and technology, diverse EY teams in over 150 countries provide trust through assurance and help clients grow, transform and operate. Working across assurance, consulting, law, strategy, tax and transactions, EY teams ask better questions to find new answers for the complex issues facing our world today. Show more Show less

Posted 1 week ago

Apply

9.0 years

0 Lacs

Noida, Uttar Pradesh, India

On-site

Linkedin logo

At EY, you’ll have the chance to build a career as unique as you are, with the global scale, support, inclusive culture and technology to become the best version of you. And we’re counting on your unique voice and perspective to help EY become even better, too. Join us and build an exceptional experience for yourself, and a better working world for all. EY-Strategy and Transactions - SaT– DnA Manager EY’s Data n’ Analytics team is a multi-disciplinary technology team delivering client projects and solutions across Data Management, Visualization, Business Analytics and Automation. The assignments cover a wide range of countries and industry sectors. The opportunity We’re looking for Manager - Data Engineering. The main objective of the role is to support cloud and on-prem platform analytics and data engineering projects initiated across engagement teams. The role will primarily involve conceptualizing, designing, developing, deploying and maintaining complex technology solutions which help EY solve business problems for the clients. This role will work closely with technical architects, product and business subject matter experts (SMEs), back-end developers and other solution architects and is also on-shore facing. This role will be instrumental in designing, developing, and evolving the modern data warehousing solutions and data integration build-outs using cutting edge tools and platforms for both on-prem and cloud architectures. In this role you will be coming up with design specifications, documentation, and development of data migration mappings and transformations for a modern Data Warehouse set up/data mart creation and define robust ETL processing to collect and scrub both structured and unstructured data providing self-serve capabilities (OLAP) in order to create impactful decision analytics reporting. Your Key Responsibilities Evaluating and selecting data warehousing tools for business intelligence, data population, data management, metadata management and warehouse administration for both on-prem and cloud based engagements Strong working knowledge across the technology stack including ETL, ELT, data analysis, metadata, data quality, audit and design Design, develop, and test in ETL tool environment (GUI/canvas driven tools to create workflows) Experience in design documentation (data mapping, technical specifications, production support, data dictionaries, test cases, etc.) Provides technical leadership to a team of data warehouse and business intelligence developers Coordinate with other technology users to design and implement matters of data governance, data harvesting, cloud implementation strategy, privacy, and security Adhere to ETL/Data Warehouse development Best Practices Responsible for Data orchestration, ingestion, ETL and reporting architecture for both on-prem and cloud ( MS Azure/AWS/GCP) Assisting the team with performance tuning for ETL and database processes Skills And Attributes For Success 9-11 years of total experience with 5+ years in Data warehousing/ Business Intelligence field Solid hands-on 5+ years of professional experience with creation and implementation of data warehouses on client engagements and helping create enhancements to a data warehouse Strong knowledge of data architecture for staging and reporting schemas, data models and cutover strategies using industry standard tools and technologies Architecture design and implementation experience with medium to complex on-prem to cloud migrations with any of the major cloud platforms (preferably AWS/Azure/GCP) Minimum 3+ years’ experience in Azure database offerings [ Relational, NoSQL, Datawarehouse] 3+ years hands-on experience in various Azure services preferred – Azure Data Factory, Kafka, Azure Data Explorer, Storage, Azure Data Lake, Azure Synapse Analytics, Azure Analysis Services & Databricks Minimum of 5 years of hands-on database design, modeling and integration experience with relational data sources, such as SQL Server databases, Oracle/MySQL, Azure SQL and Azure Synapse Strong in PySpark, SparkSQL Knowledge and direct experience using business intelligence reporting tools (Power BI, Alteryx, OBIEE, Business Objects, Cognos, Tableau, MicroStrategy, SSAS Cubes etc.) Strong creative instincts related to data analysis and visualization. Aggressive curiosity to learn the business methodology, data model and user personas. Strong understanding of BI and DWH best practices, analysis, visualization, and latest trends. Experience with the software development lifecycle (SDLC) and principles of product development such as installation, upgrade and namespace management Willingness to mentor team members Solid analytical, technical and problem-solving skills Excellent written and verbal communication skills To qualify for the role, you must have Bachelor’s or equivalent degree in computer science, or related field, required. Advanced degree or equivalent business experience preferred Fact driven and analytically minded with excellent attention to details Hands-on experience with data engineering tasks such as building analytical data records and experience manipulating and analysing large volumes of data Relevant work experience of minimum 9 to 11 years in a big 4 or technology/ consulting set up Ideally, you’ll also have Ability to think strategically/end-to-end with result-oriented mindset Ability to build rapport within the firm and win the trust of the clients Willingness to travel extensively and to work on client sites / practice office locations Experience with Snowflake What We Look For A Team of people with commercial acumen, technical experience and enthusiasm to learn new things in this fast-moving environment An opportunity to be a part of market-leading, multi-disciplinary team of 1400 + professionals, in the only integrated global transaction business worldwide. Opportunities to work with EY SaT practices globally with leading businesses across a range of industries What Working At EY Offers At EY, we’re dedicated to helping our clients, from start–ups to Fortune 500 companies — and the work we do with them is as varied as they are. You get to work with inspiring and meaningful projects. Our focus is education and coaching alongside practical experience to ensure your personal development. We value our employees and you will be able to control your own development with an individual progression plan. You will quickly grow into a responsible role with challenging and stimulating assignments. Moreover, you will be part of an interdisciplinary environment that emphasizes high quality and knowledge exchange. Plus, we offer: Support, coaching and feedback from some of the most engaging colleagues around Opportunities to develop new skills and progress your career The freedom and flexibility to handle your role in a way that’s right for you EY | Building a better working world EY exists to build a better working world, helping to create long-term value for clients, people and society and build trust in the capital markets. Enabled by data and technology, diverse EY teams in over 150 countries provide trust through assurance and help clients grow, transform and operate. Working across assurance, consulting, law, strategy, tax and transactions, EY teams ask better questions to find new answers for the complex issues facing our world today. Show more Show less

Posted 1 week ago

Apply

4.0 years

0 Lacs

Chennai, Tamil Nadu, India

On-site

Linkedin logo

We are seeking a skilled and motivated Data Engineer to join our dynamic team. The ideal candidate will have experience in designing, developing, and maintaining scalable data pipelines and architectures using Hadoop, PySpark, ETL processes , and Cloud technologies . Role: Senior Data Engineer Experience: 4-8 years Job locations: Coimbatore, Chennai, Bangalore, Hyderabad Responsibilities Design, develop, and maintain data pipelines for processing large-scale datasets. Build efficient ETL workflows to transform and integrate data from multiple sources. Develop and optimize Hadoop and PySpark applications for data processing. Ensure data quality, governance, and security standards are met across systems. Implement and manage Cloud-based data solutions (AWS, Azure, or GCP). Collaborate with data scientists and analysts to support business intelligence initiatives. Troubleshoot performance issues and optimize query executions in big data environments. Stay updated with industry trends and advancements in big data and cloud technologies. Required Skills Strong programming skills in Python, Scala, or Java. Hands-on experience with Hadoop ecosystem (HDFS, Hive, Spark, etc.). Expertise in PySpark for distributed data processing. Proficiency in ETL tools and workflows (SSIS, Apache Nifi, or custom pipelines). Experience with Cloud platforms (AWS, Azure, GCP) and their data-related services. Knowledge of SQL and NoSQL databases. Familiarity with data warehousing concepts and data modeling techniques. Strong analytical and problem-solving skills. Interested can contact us at +91 7305206696/ saranyadevib@talentien.com Skills: sql,data warehousing,aws,cloud,hadoop,scala,java,python,data engineering,azure,cloud technologies (aws, azure, gcp),etl processes,data modeling,nosql,pyspark,etl Show more Show less

Posted 1 week ago

Apply

4.0 years

0 Lacs

Coimbatore, Tamil Nadu, India

On-site

Linkedin logo

We are seeking a skilled and motivated Data Engineer to join our dynamic team. The ideal candidate will have experience in designing, developing, and maintaining scalable data pipelines and architectures using Hadoop, PySpark, ETL processes , and Cloud technologies . Role: Senior Data Engineer Experience: 4-8 years Job locations: Coimbatore, Chennai, Bangalore, Hyderabad Responsibilities Design, develop, and maintain data pipelines for processing large-scale datasets. Build efficient ETL workflows to transform and integrate data from multiple sources. Develop and optimize Hadoop and PySpark applications for data processing. Ensure data quality, governance, and security standards are met across systems. Implement and manage Cloud-based data solutions (AWS, Azure, or GCP). Collaborate with data scientists and analysts to support business intelligence initiatives. Troubleshoot performance issues and optimize query executions in big data environments. Stay updated with industry trends and advancements in big data and cloud technologies. Required Skills Strong programming skills in Python, Scala, or Java. Hands-on experience with Hadoop ecosystem (HDFS, Hive, Spark, etc.). Expertise in PySpark for distributed data processing. Proficiency in ETL tools and workflows (SSIS, Apache Nifi, or custom pipelines). Experience with Cloud platforms (AWS, Azure, GCP) and their data-related services. Knowledge of SQL and NoSQL databases. Familiarity with data warehousing concepts and data modeling techniques. Strong analytical and problem-solving skills. Interested can contact us at +91 7305206696/ saranyadevib@talentien.com Skills: sql,data warehousing,aws,cloud,hadoop,scala,java,python,data engineering,azure,cloud technologies (aws, azure, gcp),etl processes,data modeling,nosql,pyspark,etl Show more Show less

Posted 1 week ago

Apply

5.0 - 8.0 years

15 - 19 Lacs

Noida, Chennai, Bengaluru

Hybrid

Naukri logo

1.Big Data Engineer: Company: LTIMINDTREE Mandotory skills - Python,Pyspark & AWS Location : Noida & Pan India Experience : 5-8Years Salary: 19LPA Share your cv at Muktai.S@alphacom.in

Posted 1 week ago

Apply

7.0 years

0 Lacs

Chennai, Tamil Nadu, India

On-site

Linkedin logo

Job Description: Lead Data Engineer Experience Level: 7-10 years WFO-Chennai Required Skills: AI /ML-skill mandatory · AWS Glue, EMR (Spark), Redshift, Step Functions: Deep experience in using AWS tools for data processing and pipeline orchestration. · Python, SQL, PySpark/Scala: Advanced skills in Python and SQL, as well as experience in Spark-based data processing frameworks. · Data Pipeline Performance: Expertise in tuning and optimizing data pipelines for performance and scalability. · Monitoring & Troubleshooting: Experience with monitoring data systems and troubleshooting performance bottlenecks. Job Description (JD): The Lead Data Engineer will oversee the design, development, and optimization of data pipelines that serve as the backbone for data processing and analytics. The candidate should have a strong command of cloud-based data engineering tools and the ability to lead teams in building scalable, high-performance data systems. You will collaborate closely with data scientists, analysts, and product teams to deliver reliable and efficient data architectures on AWS, ensuring they meet both current and future business needs. Roles and Responsibilities: · Lead Data Pipeline Development: Architect and develop scalable, secure, and optimized data pipelines using AWS Glue, Redshift, EMR (Spark), and Step Functions. · Data Performance Tuning: Optimize data pipelines for performance, ensuring minimal latency and high throughput. · Collaborate with Stakeholders: Work with data scientists, business analysts, and other engineers to understand requirements and deliver effective solutions. · Data Storage & Management: Ensure efficient management of data storage in Redshift, Glue, and other AWS services. · Team Leadership & Mentorship: Guide and mentor a team of data engineers, ensuring best practices for data engineering are followed. · System Monitoring & Troubleshooting: Set up monitoring for all data pipelines and perform proactive troubleshooting to minimize downtime. · Continuous Improvement: Stay up-to-date with emerging technologies and improve existing data pipelines to enhance performance and scalability. Qualifications: · Bachelor’s degree in Computer Science, Engineering, or a related field. · 7+ years of experience in data engineering with expertise in AWS technologies. · Advanced skills in Python, SQL, and Spark. · Solid understanding of data engineering principles, including ETL processes, performance optimization, and scalability. · Experience in leading teams and mentoring junior engineers. · Excellent communication skills and ability to collaborate with cross-functional teams. Preferred Skills: · Experience with containerization (Docker, Kubernetes). · Familiarity with Apache Kafka, Kinesis, or other streaming technologies. · Knowledge of machine learning frameworks and their integration into data pipelines. · Familiarity with infrastructure as code tools such as Terraform. Skills PRIMARY COMPETENCY : Data Engineering PRIMARY SKILL : DBA / Data Modeling / Data Engineering PRIMARY SKILL PERCENTAGE : 100 Show more Show less

Posted 1 week ago

Apply

Exploring PySpark Jobs in India

PySpark, a powerful data processing framework built on top of Apache Spark and Python, is in high demand in the job market in India. With the increasing need for big data processing and analysis, companies are actively seeking professionals with PySpark skills to join their teams. If you are a job seeker looking to excel in the field of big data and analytics, exploring PySpark jobs in India could be a great career move.

Top Hiring Locations in India

Here are 5 major cities in India where companies are actively hiring for PySpark roles: 1. Bangalore 2. Pune 3. Hyderabad 4. Mumbai 5. Delhi

Average Salary Range

The estimated salary range for PySpark professionals in India varies based on experience levels. Entry-level positions can expect to earn around INR 6-8 lakhs per annum, while experienced professionals can earn upwards of INR 15 lakhs per annum.

Career Path

In the field of PySpark, a typical career progression may look like this: 1. Junior Developer 2. Data Engineer 3. Senior Developer 4. Tech Lead 5. Data Architect

Related Skills

In addition to PySpark, professionals in this field are often expected to have or develop skills in: - Python programming - Apache Spark - Big data technologies (Hadoop, Hive, etc.) - SQL - Data visualization tools (Tableau, Power BI)

Interview Questions

Here are 25 interview questions you may encounter when applying for PySpark roles:

  • Explain what PySpark is and its main features (basic)
  • What are the advantages of using PySpark over other big data processing frameworks? (medium)
  • How do you handle missing or null values in PySpark? (medium)
  • What is RDD in PySpark? (basic)
  • What is a DataFrame in PySpark and how is it different from an RDD? (medium)
  • How can you optimize performance in PySpark jobs? (advanced)
  • Explain the difference between map and flatMap transformations in PySpark (basic)
  • What is the role of a SparkContext in PySpark? (basic)
  • How do you handle schema inference in PySpark? (medium)
  • What is a SparkSession in PySpark? (basic)
  • How do you join DataFrames in PySpark? (medium)
  • Explain the concept of partitioning in PySpark (medium)
  • What is a UDF in PySpark? (medium)
  • How do you cache DataFrames in PySpark for optimization? (medium)
  • Explain the concept of lazy evaluation in PySpark (medium)
  • How do you handle skewed data in PySpark? (advanced)
  • What is checkpointing in PySpark and how does it help in fault tolerance? (advanced)
  • How do you tune the performance of a PySpark application? (advanced)
  • Explain the use of Accumulators in PySpark (advanced)
  • How do you handle broadcast variables in PySpark? (advanced)
  • What are the different data sources supported by PySpark? (medium)
  • How can you run PySpark on a cluster? (medium)
  • What is the purpose of the PySpark MLlib library? (medium)
  • How do you handle serialization and deserialization in PySpark? (advanced)
  • What are the best practices for deploying PySpark applications in production? (advanced)

Closing Remark

As you explore PySpark jobs in India, remember to prepare thoroughly for interviews and showcase your expertise confidently. With the right skills and knowledge, you can excel in this field and advance your career in the world of big data and analytics. Good luck!

cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies