Home
Jobs

4025 Pyspark Jobs - Page 27

Filter Interviews
Min: 0 years
Max: 25 years
Min: ₹0
Max: ₹10000000
Setup a job Alert
Filter
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

10.0 - 13.0 years

8 - 17 Lacs

Hyderabad, Chennai, Bengaluru

Hybrid

Naukri logo

Detailed job description - Skill Set: Looking for 10+ Y / highly experienced and deeply hands-on Data Architect to lead the design, build, and optimization of our data platforms on AWS and Databricks. This role requires a strong blend of architectural vision and direct implementation expertise, ensuring scalable, secure, and performant data solutions from concept to production. Strong hand on exp in data engineering/architecture, hands-on architectural and implementation experience on AWS and Databricks, Schema modeling . AWS: Deep hands-on expertise with key AWS data services and infrastructure. Databricks: Expert-level hands-on development with Databricks (Spark SQL, PySpark), Delta Lake, and Unity Catalog. Coding: Exceptional proficiency in Python , Pyspark , Spark , AWS Services and SQL. Architectural: Strong data modeling and architectural design skills with a focus on practical implementation. Preferred: AWS/Databricks certifications, experience with streaming technologies, and other data tools. Design & Build: Lead and personally execute the design, development, and deployment of complex data architectures and pipelines on AWS (S3, Glue, Lambda, Redshift, etc.) and Databricks (PySpark/Spark SQL, Delta Lake, Unity Catalog). Databricks Expertise: Own the hands-on development, optimization, and performance tuning of Databricks jobs, clusters, and notebooks. Mandatory Skills AWS, Databricks

Posted 6 days ago

Apply

4.0 - 9.0 years

1 - 2 Lacs

Kolkata, Pune, Chennai

Hybrid

Naukri logo

Role & responsibilities: Developing Modern Data Warehouse solutions using Databricks and AWS/ Azure Stack Ability to provide solutions that are forward-thinking in data engineering and analytics space Collaborate with DW/BI leads to understand new ETL pipeline development requirements. Triage issues to find gaps in existing pipelines and fix the issues Work with business to understand the need in reporting layer and develop data model to fulfill reporting needs Help joiner team members to resolve issues and technical challenges. Drive technical discussion with client architect and team members Orchestrate the data pipelines in scheduler via Airflow Preferred candidate profile: Bachelor's and/or masters degree in computer science or equivalent experience. Must have total 3+ yrs. of IT experience and 3+ years' experience in Data warehouse/ETL projects. Deep understanding of Star and Snowflake dimensional modelling. Strong knowledge of Data Management principles Good understanding of Databricks Data & AI platform and Databricks Delta Lake Architecture Should have hands-on experience in SQL , Python and Spark (PySpark) Candidate must have experience in AWS/ Azure stack Desirable to have ETL with batch and streaming (Kinesis). Experience in building ETL / data warehouse transformation processes Experience with Apache Kafka for use with streaming data / event-based data Experience with other Open-Source big data products Hadoop (incl. Hive, Pig, Impala) Experience with Open Source non-relational / NoSQL data repositories (incl. MongoDB, Cassandra, Neo4J) Experience working with structured and unstructured data including imaging & geospatial data. Experience working in a Dev/Ops environment with tools such as Terraform, CircleCI, GIT. Proficiency in RDBMS, complex SQL, PL/SQL, Unix Shell Scripting, performance tuning and troubleshoot Databricks Certified Data Engineer Associate/Professional Certification (Desirable). Comfortable working in a dynamic, fast-paced, innovative environment with several ongoing concurrent projects Should have experience working in Agile methodology Strong verbal and written communication skills. Strong analytical and problem-solving skills with a high attention to detail.

Posted 6 days ago

Apply

4.0 years

0 Lacs

Pune, Maharashtra, India

On-site

Linkedin logo

Location: Hyderabad / Chennai / Pune/ Mumbai (Hybrid) Notice Period : upto 60days About Us: Zemoso Technologies is a Software Product Market Fit Studio that brings Silicon Valley- style rapid prototyping and rapid application builds to Entrepreneurs and Corporate innovation. We offer Innovation as a service and work on ideas from scratch and take them to the Product Market Fit stage using Design Thinking -> Lean Execution -> Agile Methodology. We were featured as one of Deloitte's Fastest 50 growing tech companies from India thrice (2016, 2018, and 2019). We were also featured in Deloitte Technology Fast 500 Asia Pacific both in 2016 and 2018. We are located in Hyderabad, India, Dallas, US & have recently incorporated another office in Waterloo, Canada. What You Will Do: - Develop innovative software solutions using design thinking, lean, and agile methodologies. - Work on high-quality software products using the latest technologies and platforms. - Collaborate with fast-paced, dynamic teams to deliver value-driven client experiences. - Mentor and contribute to the growth of the next generation of developers. Must-Have Skills: - Experience: 4+ years. - Strong proficiency in Python programming language and Django. - Bachelor’s or Master’s degree in Computer Science, Data Science, or a related field. Nice to Have Qualifications: - Experience with Pandas and PySpark. - Product and customer-centric mindset. - Great Object-Oriented skills, including design patterns. - Good to great problem-solving and communication skills. - Experience in working with cross-border, distributed teams. Get to know us better: https://www.zemosolabs.com Show more Show less

Posted 6 days ago

Apply

6.0 - 11.0 years

30 - 40 Lacs

Chennai

Work from Office

Naukri logo

Role & responsibilities Data Engineer, with working on data migration projects. Experience with Azure data stack, including Data Lake Storage, Synapse Analytics, ADF, Azure Databricks, and Azure ML. Solid knowledge of Python, PySpark and other Python packages Familiarity with ML workflows and collaboration with data science teams. Strong understanding of data governance, security, and compliance in financial domains. Experience with CI/CD tools and version control systems (e.g., Azure DevOps, Git). Experience modularizing and migrating ML logic Note:- We encourage interested candidates to submit their updated CVs to mohan.kumar@changepond.com

Posted 6 days ago

Apply

6.0 - 10.0 years

12 - 15 Lacs

Chennai, Coimbatore, Mumbai (All Areas)

Work from Office

Naukri logo

We have an urgent requirement for Role: (Senior Azure Data Engineer) Experience: 6 years. Notice Period: 0-15 days Max Position: C2H Should be able to work in Flexible timing. Communication should be excellent. Must Have: Strong understanding of ADF, Azure, Databricks, PySpark, Strong understanding of SQL, ADO, PowerBI, Unity Catalog is mandatory

Posted 6 days ago

Apply

10.0 years

0 Lacs

Pune, Maharashtra, India

On-site

Linkedin logo

Data Ops Capability Deployment - Analyst is a seasoned professional role. Applies in-depth disciplinary knowledge, contributing to the development of new solutions/frameworks/techniques and the improvement of processes and workflow for Enterprise Data function. Integrates subject matter and industry expertise within a defined area. Requires in-depth understanding of how areas collectively integrate within the sub-function as well as coordinate and contribute to the objectives of the function and overall business. The primary purpose of this role is to perform data analytics and data analysis across different asset classes, and to build data science/Tooling capabilities within the team. This will involve working closely with the wider Enterprise Data team, in particular the front to back leads to deliver business priorities. The following role is within B & I Data Capabilities team within the Enterprise Data. The team manages the Data quality/Metrics/Controls program in addition to a broad remit to implement and embed improved data governance and data management practices throughout the region. The Data quality program is centered on enhancing Citi’s approach to data risk and addressing regulatory commitments in this area. Key Responsibilities: Hands on with data engineering background and have thorough understanding of Distributed Data platforms and Cloud services. Sound understanding of data architecture and data integration with enterprise applications Research and evaluate new data technologies, data mesh architecture and self-service data platforms Work closely with Enterprise Architecture Team on the definition and refinement of overall data strategy Should be able to address performance bottlenecks, design batch orchestrations, and deliver Reporting capabilities. Ability to perform complex data analytics (data cleansing, transformation, joins, aggregation etc.) on large complex datasets. Build analytics dashboards & data science capabilities for Enterprise Data platforms. Communicate complicated findings and propose solutions to a variety of stakeholders. Understanding business and functional requirements provided by business analysts and convert into technical design documents. Work closely with cross-functional teams e.g. Business Analysis, Product Assurance, Platforms and Infrastructure, Business Office, Control and Production Support. Prepare handover documents and manage SIT, UAT and Implementation. Demonstrate an in-depth understanding of how the development function integrates within overall business/technology to achieve objectives; requires a good understanding of the banking industry. Performs other duties and functions as assigned. Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency. Skills & Qualifications 10 + years of active development background and experience in Financial Services or Finance IT is a required. Experience with Data Quality/Data Tracing/Data Lineage/Metadata Management Tools Hands on experience for ETL using PySpark on distributed platforms along with data ingestion, Spark optimization, resource utilization, capacity planning & batch orchestration. In depth understanding of Hive, HDFS, Airflow, job scheduler Strong programming skills in Python with experience in data manipulation and analysis libraries (Pandas, Numpy) Should be able to write complex SQL/Stored Procs Should have worked on DevOps, Jenkins/Lightspeed, Git, CoPilot. Strong knowledge in one or more of the BI visualization tools such as Tableau, PowerBI. Proven experience in implementing Datalake/Datawarehouse for enterprise use cases. Exposure to analytical tools and AI/ML is desired. Education: Bachelor's/University degree, master's degree in information systems, Business Analysis / Computer Science. ------------------------------------------------------ Job Family Group: Data Governance ------------------------------------------------------ Job Family: Data Governance Foundation ------------------------------------------------------ Time Type: Full time ------------------------------------------------------ Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law. If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi. View Citi’s EEO Policy Statement and the Know Your Rights poster. Show more Show less

Posted 6 days ago

Apply

3.0 years

0 Lacs

Greater Chennai Area

On-site

Linkedin logo

Chennai / Bangalore / Hyderabad Who We Are Tiger Analytics is a global leader in AI and analytics, helping Fortune 1000 companies solve their toughest challenges. We offer full-stack AI and analytics services & solutions to empower businesses to achieve real outcomes and value at scale. We are on a mission to push the boundaries of what AI and analytics can do to help enterprises navigate uncertainty and move forward decisively. Our purpose is to provide certainty to shape a better tomorrow. Our team of 4000+ technologists and consultants are based in the US, Canada, the UK, India, Singapore and Australia, working closely with clients across CPG, Retail, Insurance, BFS, Manufacturing, Life Sciences, and Healthcare. Many of our team leaders rank in Top 10 and 40 Under 40 lists, exemplifying our dedication to innovation and excellence. We are a Great Place to Work-Certified™ (2022-24), recognized by analyst firms such as Forrester, Gartner, HFS, Everest, ISG and others. We have been ranked among the ‘Best’ and ‘Fastest Growing’ analytics firms lists by Inc., Financial Times, Economic Times and Analytics India Magazine. Curious about the role? What your typical day would look like? We are looking for a Senior Analyst or Machine Learning Engineer who will work on a broad range of cutting-edge data analytics and machine learning problems across a variety of industries. More specifically, you will Engage with clients to understand their business context. Translate business problems and technical constraints into technical requirements for the desired analytics solution. Collaborate with a team of data scientists and engineers to embed AI and analytics into the business decision processes. What do we expect? 3+ years of experience with at least 1+ years of relevant DS experience. Proficient in a structured Python, Pyspark, Machine Learning (Experience in productionizing models) Proficient in AWS cloud technologies is mandatory Experience and good understanding with Sagemaker/Data Bricks Experience in MLOPS frameworks (e.g Mlflow or Kubeflow) Follows good software engineering practices and has an interest in building reliable and robust software. Good understanding of DS concepts and DS model lifecycle. Working knowledge of Linux or Unix environments ideally in a cloud environment. Model deployment / model monitoring experience (Preferably in Banking Domain) CI/CD pipeline creation is good to have Excellent written and verbal communication skills B.Tech from Tier-1 college / M.S or M. Tech is preferred You are important to us, let’s stay connected! Every individual comes with a different set of skills and qualities so even if you don’t tick all the boxes for the role today, we urge you to apply as there might be a suitable/unique role for you tomorrow. We are an equal-opportunity employer. Our diverse and inclusive culture and values guide us to listen, trust, respect, and encourage people to grow the way they desire. Note: The designation will be commensurate with expertise and experience. Compensation packages are among the best in the industry. Additional Benefits: Health insurance (self & family), virtual wellness platform, and knowledge communities. Show more Show less

Posted 6 days ago

Apply

4.0 - 8.0 years

6 - 10 Lacs

Pune, Gurugram

Work from Office

Naukri logo

ZS is a place where passion changes lives. As a management consulting and technology firm focused on improving life and how we live it , our most valuable asset is our people. Here you ll work side-by-side with a powerful collective of thinkers and experts shaping life-changing solutions for patients, caregivers and consumers, worldwide. ZSers drive impact by bringing a client first mentality to each and every engagement. We partner collaboratively with our clients to develop custom solutions and technology products that create value and deliver company results across critical areas of their business. Bring your curiosity for learning; bold ideas; courage an d passion to drive life-changing impact to ZS. Our most valuable asset is our people . At ZS we honor the visible and invisible elements of our identities, personal experiences and belief systems the ones that comprise us as individuals, shape who we are and make us unique. We believe your personal interests, identities, and desire to learn are part of your success here. Learn more about our diversity, equity, and inclusion efforts and the networks ZS supports to assist our ZSers in cultivating community spaces, obtaining the resources they need to thrive, and sharing the messages they are passionate about. Business Technology ZS s Technology group focuses on scalable strategies, assets and accelerators that deliver to our clients enterprise-wide transformation via cutting-edge technology. We leverage digital and technology solutions to optimize business processes, enhance decision-making, and drive innovation. Our services include, but are not limited to, Digital and Technology advisory, Product and Platform development and Data, Analytics and AI implementation. What you ll do Undertake complete ownership in accomplishing activities and assigned responsibilities across all phases of project lifecycle to solve business problems across one or more client engagements; Apply appropriate development methodologies (e.g.agile, waterfall) and best practices (e.g. mid-development client reviews, embedded QA procedures, unit testing) to ensure successful and timely completion of assignments; Collaborate with other team members to leverage expertise and ensure seamless transitions; Exhibit flexibility in undertaking new and challenging problems and demonstrate excellent task management; Assist in creating project outputs such as business case development, solution vision and design, user requirements, prototypes, and technical architecture (if needed), test cases, and operations management; Bring transparency in driving assigned tasks to completion and report accurate status; Bring Consulting mindset in problem solving, innovation by leveraging technical and business knowledge/ expertise and collaborate across other teams; Assist senior team members, delivery leads in project management responsibilities What you ll bring Big Data TechnologiesProficiency in working with big data technologies, particularly in the context of Azure Databricks, which may include Apache Spark for distributed data processing. Azure DatabricksIn-depth knowledge of Azure Databricks for data engineering tasks, including data transformations, ETL processes, and job scheduling. SQL and Query OptimizationStrong SQL skills for data manipulation and retrieval, along with the ability to optimize queries for performance in Snowflake. ETL (Extract, Transform, Load)Expertise in designing and implementing ETL processes to move and transform data between systems, utilizing tools and frameworks available in Azure Databricks. Data IntegrationExperience with integrating diverse data sources into a cohesive and usable format, ensuring data quality and integrity. Python/PySparkKnowledge of programming languages like Python and PySpark for scripting and extending the functionality of Azure Databricks notebooks. Version ControlFamiliarity with version control systems, such as Git, for managing code and configurations in a collaborative environment. Monitoring and OptimizationAbility to monitor data pipelines, identify bottlenecks, and optimize performance for both Azure Data Factory Security and ComplianceUnderstanding of security best practices and compliance considerations when working with sensitive data in Azure and Snowflake environments. Snowflake Data WarehouseExperience in designing, implementing, and optimizing data warehouses using Snowflake, including schema design, performance tuning, and query optimization. Healthcare Domain Knowledge: Familiarity with US health plan terminologies and datasets is essential. Programming/Scripting Languages: Proficiency in Python, SQL, and PySpark is required. Cloud Platforms: Experience with AWS or Azure, specifically in building data pipelines, is needed. Cloud-Based Data Platforms: Working knowledge of Snowflake and Databricks is preferred. Data Pipeline Orchestration: Experience with Azure Data Factory and AWS Glue for orchestrating data pipelines is necessary. Relational Databases: Competency with relational databases such as PostgreSQL and MySQL is required, while experience with NoSQL databases is a plus. BI Tools: Knowledge of BI tools such as Tableau and PowerBI is expected. Version Control: Proficiency with Git, including branching, merging, and pull requests, is required. CI/CD for Data Pipelines: Experience in implementing continuous integration and delivery for data workflows using tools like Azure DevOps is essential. Additional Skills Experience with front-end technologies such as SQL, JavaScript, HTML, CSS, and Angular is advantageous. Familiarity with web development frameworks like Flask, Django, and FAST API is beneficial. Basic knowledge of AWS CI/CD practices is a plus. Strong verbal and written communication skills with ability to articulate results and issues to internal and client teams; Proven ability to work creatively and analytically in a problem-solving environment; Willingness to travel to other global offices as needed to work with client or other internal project teams. Perks & Benefits ZS offers a comprehensive total rewards package including health and well-being, financial planning, annual leave, personal growth and professional development. Our robust skills development programs, multiple career progression options and internal mobility paths and collaborative culture empowers you to thrive as an individual and global team member. We are committed to giving our employees a flexible and connected way of working. A flexible and connected ZS allows us to combine work from home and on-site presence at clients/ZS offices for the majority of our week. The magic of ZS culture and innovation thrives in both planned and spontaneous face-to-face connections. Travel Travel is a requirement at ZS for client facing ZSers; business needs of your project and client are the priority. While some projects may be local, all client-facing ZSers should be prepared to travel as needed. Travel provides opportunities to strengthen client relationships, gain diverse experiences, and enhance professional growth by working in different environments and cultures. Considering applying At ZS, we're building a diverse and inclusive company where people bring their passions to inspire life-changing impact and deliver better outcomes for all. We are most interested in finding the best candidate for the job and recognize the value that candidates with all backgrounds, including non-traditional ones, bring. If you are interested in joining us, we encourage you to apply even if you don't meet 100% of the requirements listed above. To Complete Your Application Candidates must possess or be able to obtain work authorization for their intended country of employment.An on-line application, including a full set of transcripts (official or unofficial), is required to be considered.

Posted 6 days ago

Apply

4.0 - 9.0 years

6 - 12 Lacs

Hyderabad

Work from Office

Naukri logo

ABOUT THE ROLE Role Description: We are seeking an experienced MDM Senior Data Engineer with 6- 9 years of experience and expertise in backend engineering to work closely with business on development and operations of our Master Data Management (MDM) platforms, with hands-on experience in Informatica or Reltio and data engineering experience . This role will also involve guiding junior data engineers /analysts , and quality experts to deliver high-performance, scalable, and governed MDM solutions that align with enterprise data strategy. To succeed in this role, the candidate must have strong Data Engineering experience along with MDM knowledge, hence the candidates having only MDM experience are not eligible for this role. Candidate must have data engineering experience on technologies like (SQL, Python, PySpark , Databricks, AWS, API Integrations etc ), along with knowledge of MDM (Master Data Management) Roles & Responsibilities: Develop the MDM backend solutions and implement ETL and Data engineering pipelines using Databricks, AWS, Python/PySpark, SQL etc Lead the implementation and optimization of MDM solutions using Informatica or Reltio platforms. Perform data profiling and identify the DQ rules need. Define and drive enterprise-wide MDM architecture, including IDQ, data stewardship, and metadata workflows. Manage cloud-based infrastructure using AWS and Databricks to ensure scalability and performance. Ensure data integrity, lineage, and traceability across MDM pipelines and solutions. Provide mentorship and technical leadership to junior team members and ensure project delivery timelines. Help custom UI team for integration with backend data using API or other integration methods for better user experience on data stewardship Basic Qualifications and Experience: Masters degree with 4 - 6 years of experience in Business, Engineering, IT or related field OR Bachelors degree with 6 - 9 years of experience in Business, Engineering, IT or related field OR Diploma with 10 - 12 years of experience in Business, Engineering, IT or related field Functional Skills: Must-Have Skills: Strong understanding and hands on experience of Databricks and AWS cloud services. Proficiency in Python, PySpark, SQL, and Unix for data processing and orchestration. Deep knowledge of MDM tools (Informatica, Reltio) and data quality frameworks (IDQ). Must have knowledge on customer master data (HCP, HCO etc) Experience with data modeling, governance, and DCR lifecycle management. Able to implement end to end integrations including API based integrations, Batch integrations and Flat file-based integrations Strong experience with external data enrichments like D&B Strong experience on match/merge and survivorship rules implementations Very good understanding on reference data and its integration with MDM Hands on experience with custom workflows or building data pipelines/orchestrations Good-to-Have Skills: Experience with Tableau or PowerBI for reporting MDM insights. Exposure or knowledge of DataScience and GenAI capabilities. Exposure to Agile practices and tools (JIRA, Confluence). Prior experience in Pharma/Life Sciences. Understanding of compliance and regulatory considerations in master data. Professional Certifications : Any MDM certification (e.g. Informatica, Reltio etc) Databricks Certifications (Data engineer or Architect) Any cloud certification (AWS or AZURE) Soft Skills: Strong analytical abilities to assess and improve master data processes and solutions. Excellent verbal and written communication skills, with the ability to convey complex data concepts clearly to technical and non-technical stakeholders. Effective problem-solving skills to address data-related issues and implement scalable solutions. Ability to work effectively with global, virtual teams

Posted 6 days ago

Apply

0 years

0 Lacs

Chennai, Tamil Nadu, India

On-site

Linkedin logo

Role : MLOps Engineer Location - Chennai - CKC Mode of Interview - In Person Data - 7th June 2025 (Saturday) Key Words -Skillset AWS SageMaker, Azure ML Studio, GCP Vertex AI PySpark, Azure Databricks MLFlow, KubeFlow, AirFlow, Github Actions, AWS CodePipeline Kubernetes, AKS, Terraform, Fast API Responsibilities Model Deployment, Model Monitoring, Model Retraining Deployment pipeline, Inference pipeline, Monitoring pipeline, Retraining pipeline Drift Detection, Data Drift, Model Drift Experiment Tracking MLOps Architecture REST API publishing Job Responsibilities Research and implement MLOps tools, frameworks and platforms for our Data Science projects. Work on a backlog of activities to raise MLOps maturity in the organization. Proactively introduce a modern, agile and automated approach to Data Science. Conduct internal training and presentations about MLOps tools’ benefits and usage. Required Experience And Qualifications Wide experience with Kubernetes. Experience in operationalization of Data Science projects (MLOps) using at least one of the popular frameworks or platforms (e.g. Kubeflow, AWS Sagemaker, Google AI Platform, Azure Machine Learning, DataRobot, DKube). Good understanding of ML and AI concepts. Hands-on experience in ML model development. Proficiency in Python used both for ML and automation tasks. Good knowledge of Bash and Unix command line toolkit. Experience in CI/CD/CT pipelines implementation. Experience with cloud platforms - preferably AWS - would be an advantage. Show more Show less

Posted 6 days ago

Apply

0 years

0 Lacs

Chennai, Tamil Nadu, India

On-site

Linkedin logo

Company Overview Viraaj HR Solutions is a leading recruitment firm in India, dedicated to connecting top talent with industry-leading companies. We focus on understanding the unique needs of each client, providing tailored HR solutions that enhance their workforce capabilities. Our mission is to empower organizations by bridging the gap between talent and opportunity. We value integrity, collaboration, and excellence in service delivery, ensuring a seamless experience for both candidates and employers. Job Title: PySpark Data Engineer Work Mode: On-Site Location: India Role Responsibilities Design, develop, and maintain data pipelines using PySpark. Collaborate with data scientists and analysts to gather data requirements. Optimize data processing workflows for efficiency and performance. Implement ETL processes to integrate data from various sources. Create and maintain data models that support analytical reporting. Ensure data quality and accuracy through rigorous testing and validation. Monitor and troubleshoot production data pipelines to resolve issues. Work with SQL databases to extract and manipulate data as needed. Utilize cloud technologies for data storage and processing solutions. Participate in code reviews and provide constructive feedback. Document technical specifications and processes clearly for team reference. Stay updated with industry trends and emerging technologies in big data. Collaborate with cross-functional teams to deliver data solutions. Support the data governance initiatives to ensure compliance. Provide training and mentorship to junior data engineers. Qualifications Bachelor's degree in Computer Science, Information Technology, or related field. Proven experience as a Data Engineer, preferably with PySpark. Strong understanding of data warehousing concepts and architecture. Hands-on experience with ETL tools and frameworks. Proficiency in SQL and NoSQL databases. Familiarity with cloud platforms like AWS, Azure, or Google Cloud. Experience with Python programming for data manipulation. Knowledge of data modeling techniques and best practices. Ability to work in a fast-paced environment and juggle multiple tasks. Excellent problem-solving skills and attention to detail. Strong communication and interpersonal skills. Ability to work independently and as part of a team. Experience in Agile methodologies and practices. Knowledge of data governance and compliance standards. Familiarity with BI tools such as Tableau or Power BI is a plus. Skills: data modeling,python programming,pyspark,bi tools,sql proficiency,sql,cloud technologies,nosql databases,etl processes,data warehousing,agile methodologies,cloud computing,data engineer Show more Show less

Posted 6 days ago

Apply

2.0 - 4.0 years

4 - 6 Lacs

Hyderabad

Work from Office

Naukri logo

Overview Data Science Team works in developing Machine Learning (ML) and Artificial Intelligence (AI) projects. Specific scope of this role is to develop ML solution in support of ML/AI projects using big analytics toolsets in a CI/CD environment. Analytics toolsets may include DS tools/Spark/Databricks, and other technologies offered by Microsoft Azure or open-source toolsets. This role will also help automate the end-to-end cycle with Azure Pipelines. You will be part of a collaborative interdisciplinary team around data, where you will be responsible of our continuous delivery of statistical/ML models. You will work closely with process owners, product owners and final business users. This will provide you the correct visibility and understanding of criticality of your developments. Responsibilities Delivery of key Advanced Analytics/Data Science projects within time and budget, particularly around DevOps/MLOps and Machine Learning models in scope Active contributor to code & development in projects and services Partner with data engineers to ensure data access for discovery and proper data is prepared for model consumption. Partner with ML engineers working on industrialization. Communicate with business stakeholders in the process of service design, training and knowledge transfer. Support large-scale experimentation and build data-driven models. Refine requirements into modelling problems. Influence product teams through data-based recommendations. Research in state-of-the-art methodologies. Create documentation for learnings and knowledge transfer. Create reusable packages or libraries. Ensure on time and on budget delivery which satisfies project requirements, while adhering to enterprise architecture standards Leverage big data technologies to help process data and build scaled data pipelines (batch to real time) Implement end-to-end ML lifecycle with Azure Databricks and Azure Pipelines Automate ML models deployments Qualifications BE/B.Tech in Computer Science, Maths, technical fields. Overall 2-4 years of experience working as a Data Scientist. 2+ years experience building solutions in the commercial or in the supply chain space. 2+ years working in a team to deliver production level analytic solutions. Fluent in git (version control). Understanding of Jenkins, Docker are a plus. Fluent in SQL syntaxis. 2+ years experience in Statistical/ML techniques to solve supervised (regression, classification) and unsupervised problems. 2+ years experience in developing business problem related statistical/ML modeling with industry tools with primary focus on Python or Pyspark development. Data Science Hands on experience and strong knowledge of building machine learning models supervised and unsupervised models. Knowledge of Time series/Demand Forecast models is a plus Programming Skills Hands-on experience in statistical programming languages like Python, Pyspark and database query languages like SQL Statistics Good applied statistical skills, including knowledge of statistical tests, distributions, regression, maximum likelihood estimators Cloud (Azure) Experience in Databricks and ADF is desirable Familiarity with Spark, Hive, Pig is an added advantage Business storytelling and communicating data insights in business consumable format. Fluent in one Visualization tool. Strong communications and organizational skills with the ability to deal with ambiguity while juggling multiple priorities Experience with Agile methodology for team work and analytics product creation. Experience in Reinforcement Learning is a plus. Experience in Simulation and Optimization problems in any space is a plus. Experience with Bayesian methods is a plus. Experience with Causal inference is a plus. Experience with NLP is a plus. Experience with Responsible AI is a plus. Experience with distributed machine learning is a plus Experience in DevOps, hands-on experience with one or more cloud service providers AWS, GCP, Azure(preferred) Model deployment experience is a plus Experience with version control systems like GitHub and CI/CD tools Experience in Exploratory data Analysis Knowledge of ML Ops / DevOps and deploying ML models is preferred Experience using MLFlow, Kubeflow etc. will be preferred Experience executing and contributing to ML OPS automation infrastructure is good to have Exceptional analytical and problem-solving skills Stakeholder engagement-BU, Vendors. Experience building statistical models in the Retail or Supply chain space is a plus

Posted 6 days ago

Apply

4.0 - 9.0 years

6 - 11 Lacs

Hyderabad

Work from Office

Naukri logo

Overview Provide data science / analytics support for the Perfect Store group who works with AMESA Sectors, a part of the broader Global Capability Center in Hyderabad, India. This role will help to enable accelerated growth for PepsiCo by building Retailer Value Offer and Shopper Value offer, aligning data, and performing advance analytics approaches for PepsiCo to drive actionable insights at Business Unit, store level. Key responsibilities will be to build and manage advance analytics-deep dives in a cloud environment, manage data and prepare data to be used for advanced analytics, artificial intelligence, machine learning, and deep learning projects. Responsibilities Support Perfect Store (Demand Accelerator) team with delivery of Retail Value Offer, Shopper Value Offer framework for AMESA sector Work within cloud environment (e.g., Microsoft Azure) Build and maintain codes for use in advanced analytics, artificial intelligence, and machine learning projects Clean and prepare data for use in advanced analytics, artificial intelligence, and machine learning projects Build deep dive analysis reports in cloud environment (using Pyspark and Python) to support BU Ask Develop, maintain, and apply statistical techniques to business questions - including Distribution, Outliers, visualizations etc. Support relationships with the key end-user stakeholders with Business Units-AMESA Own flawless execution AND Quality Check of analytics exercises Responsible for managing multiple priorities; being able to manage deadlines and deliverables Lead communication with Business Partners and potentially end-users on matters such as available capacity, changes of scope of existing projects and planning of future projects Deliver outputs in line with the agreed timelines and formats while updating existing project management tools Flag and monitor any business risks related to delivering the requested outputs Qualifications An experienced analytics profession with 4 years+ experience EducationB.Tech or any bachelors degree. Masters are optional Proficient with Python, SQL, Excel and PowerBI Plus, to have knowledge on Machine Learning algorithms Good to have Retail experience Strong collaboratorInterested and motivated by working with others. Owns the full responsibility of deliverables, quality check thoroughly, look for and work on improvements in the process Actively creates and participates in opportunities to co-create solutions across markets. Willing and able to embrace Responsive Ways of Working

Posted 6 days ago

Apply

0 years

0 Lacs

Bhubaneswar, Odisha, India

On-site

Linkedin logo

Company Overview Viraaj HR Solutions is a leading recruitment firm in India, dedicated to connecting top talent with industry-leading companies. We focus on understanding the unique needs of each client, providing tailored HR solutions that enhance their workforce capabilities. Our mission is to empower organizations by bridging the gap between talent and opportunity. We value integrity, collaboration, and excellence in service delivery, ensuring a seamless experience for both candidates and employers. Job Title: PySpark Data Engineer Work Mode: On-Site Location: India Role Responsibilities Design, develop, and maintain data pipelines using PySpark. Collaborate with data scientists and analysts to gather data requirements. Optimize data processing workflows for efficiency and performance. Implement ETL processes to integrate data from various sources. Create and maintain data models that support analytical reporting. Ensure data quality and accuracy through rigorous testing and validation. Monitor and troubleshoot production data pipelines to resolve issues. Work with SQL databases to extract and manipulate data as needed. Utilize cloud technologies for data storage and processing solutions. Participate in code reviews and provide constructive feedback. Document technical specifications and processes clearly for team reference. Stay updated with industry trends and emerging technologies in big data. Collaborate with cross-functional teams to deliver data solutions. Support the data governance initiatives to ensure compliance. Provide training and mentorship to junior data engineers. Qualifications Bachelor's degree in Computer Science, Information Technology, or related field. Proven experience as a Data Engineer, preferably with PySpark. Strong understanding of data warehousing concepts and architecture. Hands-on experience with ETL tools and frameworks. Proficiency in SQL and NoSQL databases. Familiarity with cloud platforms like AWS, Azure, or Google Cloud. Experience with Python programming for data manipulation. Knowledge of data modeling techniques and best practices. Ability to work in a fast-paced environment and juggle multiple tasks. Excellent problem-solving skills and attention to detail. Strong communication and interpersonal skills. Ability to work independently and as part of a team. Experience in Agile methodologies and practices. Knowledge of data governance and compliance standards. Familiarity with BI tools such as Tableau or Power BI is a plus. Skills: data modeling,python programming,pyspark,bi tools,sql proficiency,sql,cloud technologies,nosql databases,etl processes,data warehousing,agile methodologies,cloud computing,data engineer Show more Show less

Posted 6 days ago

Apply

6.0 - 10.0 years

8 - 14 Lacs

Bengaluru

Work from Office

Naukri logo

Work Location : Bangalore (CV Ramen Nagar location) Notice Period : Immediate - 30 days Mandatory Skills : Big Data, Python, SQL, Spark/Pyspark, AWS Cloud JD and required Skills & Responsibilities : - Actively participate in all phases of the software development lifecycle, including requirements gathering, functional and technical design, development, testing, roll-out, and support. - Solve complex business problems by utilizing a disciplined development methodology. - Produce scalable, flexible, efficient, and supportable solutions using appropriate technologies. - Analyse the source and target system data. Map the transformation that meets the requirements. - Interact with the client and onsite coordinators during different phases of a project. - Design and implement product features in collaboration with business and Technology stakeholders. - Anticipate, identify, and solve issues concerning data management to improve data quality. - Clean, prepare, and optimize data at scale for ingestion and consumption. - Support the implementation of new data management projects and re-structure the current data architecture. - Implement automated workflows and routines using workflow scheduling tools. - Understand and use continuous integration, test-driven development, and production deployment frameworks. - Participate in design, code, test plans, and dataset implementation performed by other data engineers in support of maintaining data engineering standards. - Analyze and profile data for the purpose of designing scalable solutions. - Troubleshoot straightforward data issues and perform root cause analysis to proactively resolve product issues. Required Skills : - 5+ years of relevant experience developing Data and analytic solutions. - Experience building data lake solutions leveraging one or more of the following AWS, EMR, S3, Hive & PySpark - Experience with relational SQL. - Experience with scripting languages such as Python. - Experience with source control tools such as GitHub and related dev process. - Experience with workflow scheduling tools such as Airflow. - In-depth knowledge of AWS Cloud (S3, EMR, Databricks) - Has a passion for data solutions. - Has a strong problem-solving and analytical mindset - Working experience in the design, Development, and test of data pipelines. - Experience working with Agile Teams. - Able to influence and communicate effectively, both verbally and in writing, with team members and business stakeholders - Able to quickly pick up new programming languages, technologies, and frameworks. - Bachelor's degree in computer science

Posted 6 days ago

Apply

6.0 - 7.0 years

0 Lacs

Bengaluru, Karnataka, India

On-site

Linkedin logo

Designation – Sr.Consultant Experience- 6 to 7 years Location- Bengaluru Skills Req- Python, SQL, Databrciks , ADF ,within-Databrcisk - DLT, PySpark, Structural streaming , performance and cost optimization. Roles and Responsibilities: Capture business problems, value drivers, and functional/non-functional requirements and translate into functionality. Assess the risks, feasibility, opportunities, and business impact. Assess and model processes, data flows, and technology to understand the current value and issues, and identify opportunities for improvement. Create / update clear documentation of requirements to align with the solution over the project lifecycle. Ensure traceability of requirements from business needs through testing and scope changes, to final solution. Interact with software suppliers, designers and developers to understand software limitations, deliver elements of system and database design, and ensure that business requirements and use cases are handled. Configure and document software and processes, using agreed standards and tools. Create acceptance criteria and validate that solutions meet business needs, through defining and coordinating testing. Create and present compelling business cases to justify solution value and establish approval, funding and prioritization. Initiate, plan, execute, monitor, and control Business Analysis activities on projects within agreed parameters of cost, time and quality. Lead stakeholder management activities and large design sessions. Lead teams to complete business analysis on projects. Configure and document software and processes. Define and coordinate testing. Agile project experience. Understand Agile frameworks and tools. Worked in Agile. Show more Show less

Posted 6 days ago

Apply

8.0 - 13.0 years

16 - 22 Lacs

Chennai, Bengaluru, Delhi / NCR

Work from Office

Naukri logo

About the job : Role : Senior Databricks Engineer / Databricks Technical Lead/ Data Architect Experience : 8-15 years Location : Bangalore, Chennai, Delhi, Pune Primary Roles And Responsibilities : - Developing Modern Data Warehouse solutions using Databricks and AWS/ Azure Stack - Ability to provide solutions that are forward-thinking in data engineering and analytics space - Collaborate with DW/BI leads to understand new ETL pipeline development requirements. - Triage issues to find gaps in existing pipelines and fix the issues - Work with business to understand the need in reporting layer and develop data model to fulfill reporting needs - Help joiner team members to resolve issues and technical challenges. - Drive technical discussion with client architect and team members - Orchestrate the data pipelines in scheduler via Airflow Skills And Qualifications : - Bachelor's and/or masters degree in computer science or equivalent experience. - Must have total 6+ yrs. of IT experience and 3+ years' experience in Data warehouse/ETL projects. - Deep understanding of Star and Snowflake dimensional modelling. - Strong knowledge of Data Management principles - Good understanding of Databricks Data & AI platform and Databricks Delta Lake Architecture - Should have hands-on experience in SQL, Python and Spark (PySpark) - Candidate must have experience in AWS/ Azure stack - Desirable to have ETL with batch and streaming (Kinesis). - Experience in building ETL / data warehouse transformation processes - Experience with Apache Kafka for use with streaming data / event-based data - Experience with other Open-Source big data products Hadoop (incl. Hive, Pig, Impala) - Experience with Open Source non-relational / NoSQL data repositories (incl. MongoDB, Cassandra, Neo4J) - Experience working with structured and unstructured data including imaging & geospatial data. - Experience working in a Dev/Ops environment with tools such as Terraform, CircleCI, GIT. - Proficiency in RDBMS, complex SQL, PL/SQL, Unix Shell Scripting, performance tuning and troubleshoot - Databricks Certified Data Engineer Associate/Professional Certification (Desirable). - Comfortable working in a dynamic, fast-paced, innovative environment with several ongoing concurrent projects - Should have experience working in Agile methodology - Strong verbal and written communication skills. - Strong analytical and problem-solving skills with a high attention to detail. Location - Bangalore, Chennai, Delhi / NCR, Pune

Posted 6 days ago

Apply

8.0 - 13.0 years

25 - 40 Lacs

Bengaluru

Work from Office

Naukri logo

*Must-Have Skills:* * Azure Databricks / PySpark hands-on * SQL/PL-SQL advanced level * Snowflake – 2+ years * Spark/Data pipeline development – 2+ years * Azure Repos / GitHub, Azure DevOps * Unix Shell Scripting * Cloud technology experience *Key Responsibilities:* 1. *Design, build, and manage data pipelines using Azure Databricks, PySpark, and Snowflake. 2. *Analyze and resolve production issues (Tier 2 support with weekend/on-call rotation). 3. *Write and optimize complex SQL/PL-SQL queries. 4. *Collaborate on low-level and high-level design for data solutions. 5. *Document all project deliverables and support deployment. Good to Have: Knowledge of Oracle, Qlik Replicate, GoldenGate, Hadoop Job scheduler tools like Control-M or Airflow Behavioral: Strong problem-solving & communication skills

Posted 6 days ago

Apply

3.0 years

0 Lacs

Hyderabad, Telangana, India

On-site

Linkedin logo

Company Overview Viraaj HR Solutions is dedicated to connecting top talent with forward-thinking companies. Our mission is to provide exceptional talent acquisition services while fostering a culture of trust, integrity, and collaboration. We prioritize our clients' needs and work tirelessly to ensure the ideal candidate-job match. Join us in our commitment to excellence and become part of a dynamic team focused on driving success for individuals and organizations alike. Role Responsibilities Design, develop, and implement data pipelines using Azure Data Factory. Create and maintain data models for structured and unstructured data. Extract, transform, and load (ETL) data from various sources into data warehouses. Develop analytical solutions and dashboards using Azure Databricks. Perform data integration and migration tasks with Azure tools. Ensure optimal performance and scalability of data solutions. Collaborate with cross-functional teams to understand data requirements. Utilize SQL Server for database management and data queries. Implement data quality checks and ensure data integrity. Work on data governance and compliance initiatives. Monitor and troubleshoot data pipeline issues to ensure reliability. Document data processes and architecture for future reference. Stay current with industry trends and Azure advancements. Train and mentor junior data engineers and team members. Participate in design reviews and provide feedback for process improvements. Qualifications Bachelor's degree in Computer Science, Information Technology, or a related field. 3+ years of experience in a data engineering role. Strong expertise in Azure Data Factory and Azure Databricks. Proficient in SQL for data manipulation and querying. Experience with data warehousing concepts and practices. Familiarity with ETL tools and processes. Knowledge of Python or other programming languages for data processing. Ability to design scalable cloud architecture. Experience with data modeling and database design. Effective communication and collaboration skills. Strong analytical and problem-solving abilities. Familiarity with performance tuning and optimization techniques. Knowledge of data visualization tools is a plus. Experience with Agile methodologies. Ability to work independently and manage multiple tasks. Willingness to learn and adapt to new technologies. Skills: etl,azure databricks,sql server,azure,data governance,azure data factory,python,data warehousing,data engineer,data integration,performance tuning,python scripting,sql,data modeling,data migration,data visualization,analytical solutions,pyspark,agile methodologies,data quality checks Show more Show less

Posted 6 days ago

Apply

0 years

0 Lacs

Hyderabad, Telangana, India

On-site

Linkedin logo

Company Overview Viraaj HR Solutions is a leading recruitment firm in India, dedicated to connecting top talent with industry-leading companies. We focus on understanding the unique needs of each client, providing tailored HR solutions that enhance their workforce capabilities. Our mission is to empower organizations by bridging the gap between talent and opportunity. We value integrity, collaboration, and excellence in service delivery, ensuring a seamless experience for both candidates and employers. Job Title: PySpark Data Engineer Work Mode: On-Site Location: India Role Responsibilities Design, develop, and maintain data pipelines using PySpark. Collaborate with data scientists and analysts to gather data requirements. Optimize data processing workflows for efficiency and performance. Implement ETL processes to integrate data from various sources. Create and maintain data models that support analytical reporting. Ensure data quality and accuracy through rigorous testing and validation. Monitor and troubleshoot production data pipelines to resolve issues. Work with SQL databases to extract and manipulate data as needed. Utilize cloud technologies for data storage and processing solutions. Participate in code reviews and provide constructive feedback. Document technical specifications and processes clearly for team reference. Stay updated with industry trends and emerging technologies in big data. Collaborate with cross-functional teams to deliver data solutions. Support the data governance initiatives to ensure compliance. Provide training and mentorship to junior data engineers. Qualifications Bachelor's degree in Computer Science, Information Technology, or related field. Proven experience as a Data Engineer, preferably with PySpark. Strong understanding of data warehousing concepts and architecture. Hands-on experience with ETL tools and frameworks. Proficiency in SQL and NoSQL databases. Familiarity with cloud platforms like AWS, Azure, or Google Cloud. Experience with Python programming for data manipulation. Knowledge of data modeling techniques and best practices. Ability to work in a fast-paced environment and juggle multiple tasks. Excellent problem-solving skills and attention to detail. Strong communication and interpersonal skills. Ability to work independently and as part of a team. Experience in Agile methodologies and practices. Knowledge of data governance and compliance standards. Familiarity with BI tools such as Tableau or Power BI is a plus. Skills: data modeling,python programming,pyspark,bi tools,sql proficiency,sql,cloud technologies,nosql databases,etl processes,data warehousing,agile methodologies,cloud computing,data engineer Show more Show less

Posted 6 days ago

Apply

3.0 - 6.0 years

0 Lacs

Bengaluru, Karnataka, India

On-site

Linkedin logo

Responsibilities: Develop and execute test scripts to validate data pipelines, transformations, and integrations. Formulate and maintain test strategies—including smoke, performance, functional, and regression testing—to ensure data processing and ETL jobs meet requirements. Collaborate with development teams to assess changes in data workflows and update test cases to preserve data integrity. Design and run tests for data validation, storage, and retrieval using Azure services like Data Lake, Synapse, and Data Factory, adhering to industry standards. Continuously enhance automated tests as new features are developed, ensuring timely delivery per defined quality standards. Participate in data reconciliation and verify Data Quality frameworks to maintain data accuracy, completeness, and consistency across the platform. Share knowledge and best practices by collaborating with business analysts and technology teams to document testing processes and findings. Communicate testing progress effectively with stakeholders, highlighting issues or blockers, and ensuring alignment with business objectives. Maintain a comprehensive understanding of the Azure Data Lake platform's data landscape to ensure thorough testing coverage. Skills & Experience: 3-6 years of QA experience with a strong focus on Big Data testing, particularly in Data Lake environments on Azure's cloud platform. Proficient in Azure Data Factory, Azure Synapse Analytics and Databricks for big data processing and scaled data quality checks. Proficiency in SQL, capable of writing and optimizing both simple and complex queries for data validation and testing purposes. Proficient in PySpark, with experience in data manipulation and transformation, and a demonstrated ability to write and execute test scripts for data processing and validation. Hands-on experience with Functional & system integration testing in big data environments, ensuring seamless data flow and accuracy across multiple systems. Knowledge and ability to design and execute test cases in a behaviour-driven development environment. Fluency in Agile methodologies, with active participation in Scrum ceremonies and a strong understanding of Agile principles. Familiarity with tools like Jira, including experience with X-Ray or Jira Zephyr for defect management and test case management. Proven experience working on high-traffic and large-scale software products, ensuring data quality, reliability, and performance under demanding conditions. Show more Show less

Posted 6 days ago

Apply

6.0 - 8.0 years

8 - 10 Lacs

Kolkata

Work from Office

Naukri logo

Job Summary : We are seeking an experienced Data Engineer with strong expertise in Databricks, Python, PySpark, and Power BI, along with a solid background in data integration and the modern Azure ecosystem. The ideal candidate will play a critical role in designing, developing, and implementing scalable data engineering solutions and pipelines. Key Responsibilities : - Design, develop, and implement robust data solutions using Azure Data Factory, Databricks, and related data engineering tools. - Build and maintain scalable ETL/ELT pipelines with a focus on performance and reliability. - Write efficient and reusable code using Python and PySpark. - Perform data cleansing, transformation, and migration across various platforms. - Work hands-on with Azure Data Factory (ADF) for at least 1.5 to 2 years. - Develop and optimize SQL queries, stored procedures, and manage large data sets using SQL Server, T-SQL, PL/SQL, etc. - Collaborate with cross-functional teams to understand business requirements and provide data-driven solutions. - Engage directly with clients and business stakeholders to gather requirements, suggest optimal solutions, and ensure successful delivery. - Work with Power BI for basic reporting and data visualization tasks. - Apply strong knowledge of data warehousing concepts, modern data platforms, and cloud-based analytics. - Adhere to coding standards and best practices, including thorough documentation and testing (unit, integration, performance). - Support the operations, maintenance, and enhancement of existing data pipelines and architecture. - Estimate tasks and plan release cycles effectively. Required Technical Skills : - Languages & Frameworks : Python, PySpark - Cloud & Tools : Azure Data Factory, Databricks, Azure ecosystem - Databases : SQL Server, T-SQL, PL/SQL - Reporting & BI Tools : Power BI (PBI) - Data Concepts : Data Warehousing, ETL/ELT, Data Cleansing, Data Migration - Other : Version control, Agile methodologies, good problem-solving skills Preferred Qualifications : - Experience with coding in Pysense within Databricks (added advantage) - Solid understanding of cloud data architecture and analytics processes - Ability to independently initiate and lead conversations with business stakeholders

Posted 6 days ago

Apply

0 years

0 Lacs

Kochi, Kerala, India

On-site

Linkedin logo

Company Overview Viraaj HR Solutions is a leading recruitment firm in India, dedicated to connecting top talent with industry-leading companies. We focus on understanding the unique needs of each client, providing tailored HR solutions that enhance their workforce capabilities. Our mission is to empower organizations by bridging the gap between talent and opportunity. We value integrity, collaboration, and excellence in service delivery, ensuring a seamless experience for both candidates and employers. Job Title: PySpark Data Engineer Work Mode: On-Site Location: India Role Responsibilities Design, develop, and maintain data pipelines using PySpark. Collaborate with data scientists and analysts to gather data requirements. Optimize data processing workflows for efficiency and performance. Implement ETL processes to integrate data from various sources. Create and maintain data models that support analytical reporting. Ensure data quality and accuracy through rigorous testing and validation. Monitor and troubleshoot production data pipelines to resolve issues. Work with SQL databases to extract and manipulate data as needed. Utilize cloud technologies for data storage and processing solutions. Participate in code reviews and provide constructive feedback. Document technical specifications and processes clearly for team reference. Stay updated with industry trends and emerging technologies in big data. Collaborate with cross-functional teams to deliver data solutions. Support the data governance initiatives to ensure compliance. Provide training and mentorship to junior data engineers. Qualifications Bachelor's degree in Computer Science, Information Technology, or related field. Proven experience as a Data Engineer, preferably with PySpark. Strong understanding of data warehousing concepts and architecture. Hands-on experience with ETL tools and frameworks. Proficiency in SQL and NoSQL databases. Familiarity with cloud platforms like AWS, Azure, or Google Cloud. Experience with Python programming for data manipulation. Knowledge of data modeling techniques and best practices. Ability to work in a fast-paced environment and juggle multiple tasks. Excellent problem-solving skills and attention to detail. Strong communication and interpersonal skills. Ability to work independently and as part of a team. Experience in Agile methodologies and practices. Knowledge of data governance and compliance standards. Familiarity with BI tools such as Tableau or Power BI is a plus. Skills: data modeling,python programming,pyspark,bi tools,sql proficiency,sql,cloud technologies,nosql databases,etl processes,data warehousing,agile methodologies,cloud computing,data engineer Show more Show less

Posted 6 days ago

Apply

5.0 - 10.0 years

16 - 25 Lacs

Hyderabad, Bengaluru

Work from Office

Naukri logo

Urgent Hiring for PySpark Data Engineer:- Job Location- Bangalore and Hyderabad Exp- 5yrs-9yrs Share CV Mohini.sharma@adecco.com OR Call 9740521948 Job Description: 1. API Development : Design, develop, and maintain robust APIs using FastAPI and RESTful principles for scalable backend systems. 2. Big Data Processing : Leverage PySpark to process and analyze large datasets efficiently, ensuring optimal performance in big data environments. 3. Full-Stack Integration : Develop seamless backend-to-frontend feature integrations, collaborating with front-end developers for cohesive user experiences. 4. CI/CD Pipelines : Implement and manage CI/CD pipelines using GitHub Actions and Azure DevOps to streamline deployments and ensure system reliability. 5. Containerization : Utilize Docker for building and deploying containerized applications in development and production environments. 6. Team Leadership : Lead and mentor a team of developers, providing guidance, code reviews, and support to junior team members to ensure high-quality deliverables. 7. Code Optimization : Write clean, maintainable, and efficient Python code, with a focus on scalability, reusability, and performance. 8. Cloud Deployment : Deploy and manage applications on cloud platforms like Azure , ensuring high availability and fault tolerance. 9. Collaboration : Work closely with cross-functional teams, including product managers and designers, to translate business requirements into technical solutions. 10. Documentation : Maintain thorough documentation for APIs, processes, and systems to ensure transparency and ease of maintenance Highlighted Skillset:- Big Data : Strong PySpark skills for processing large datasets. DevOps : Proficiency in GitHub Actions , CI/CD pipelines , Azure DevOps , and Docker . Integration : Experience in backend-to-frontend feature connectivity. Leadership : Proven ability to lead and mentor development teams. Cloud : Knowledge of deploying and managing applications in Azure or other cloud environments. Team Collaboration : Strong interpersonal and communication skills for working in cross-functional teams. Best Practices : Emphasis on clean code, performance optimization, and robust documentation

Posted 6 days ago

Apply

4.0 - 9.0 years

8 - 18 Lacs

Navi Mumbai, Pune, Mumbai (All Areas)

Hybrid

Naukri logo

Job Description : Job Overview: We are seeking a highly skilled Data Engineer with expertise in SQL, Python, Data Warehousing, AWS, Airflow, ETL, and Data Modeling . The ideal candidate will be responsible for designing, developing, and maintaining robust data pipelines, ensuring efficient data processing and integration across various platforms. This role requires strong problem-solving skills, an analytical mindset, and a deep understanding of modern data engineering frameworks. Key Responsibilities: Design, develop, and optimize scalable data pipelines and ETL processes to support business intelligence, analytics, and operational data needs. Build and maintain data models (conceptual, logical, and physical) to enhance data storage, retrieval, and transformation efficiency. Develop, test, and optimize complex SQL queries for efficient data extraction, transformation, and loading (ETL). Implement and manage data warehousing solutions (e.g., Snowflake, Redshift, BigQuery) for structured and unstructured data storage. Work with AWS, Azure , and cloud-based data solutions to build high-performance data ecosystems. Utilize Apache Airflow for orchestrating workflows and automating data pipeline execution. Collaborate with cross-functional teams to understand business data requirements and ensure alignment with data strategies. Ensure data integrity, security, and compliance with governance policies and best practices. Monitor, troubleshoot, and improve the performance of existing data systems for scalability and reliability. Stay updated with emerging data engineering technologies, frameworks, and best practices to drive continuous improvement. Required Skills & Qualifications: Proficiency in SQL for query development, performance tuning, and optimization. Strong Python programming skills for data processing, automation, and scripting. Hands-on experience with ETL development , data integration, and transformation workflows. Expertise in data modeling for efficient database and data warehouse design. Experience with cloud platforms such as AWS (S3, Redshift, Lambda), Azure, or GCP. Working knowledge of Airflow or similar workflow orchestration tools. Familiarity with Big Data frameworks like Hadoop or Spark (preferred but not mandatory). Strong problem-solving skills and ability to work in a fast-paced, dynamic environment. Role & responsibilities Preferred candidate profile

Posted 6 days ago

Apply

Exploring PySpark Jobs in India

PySpark, a powerful data processing framework built on top of Apache Spark and Python, is in high demand in the job market in India. With the increasing need for big data processing and analysis, companies are actively seeking professionals with PySpark skills to join their teams. If you are a job seeker looking to excel in the field of big data and analytics, exploring PySpark jobs in India could be a great career move.

Top Hiring Locations in India

Here are 5 major cities in India where companies are actively hiring for PySpark roles: 1. Bangalore 2. Pune 3. Hyderabad 4. Mumbai 5. Delhi

Average Salary Range

The estimated salary range for PySpark professionals in India varies based on experience levels. Entry-level positions can expect to earn around INR 6-8 lakhs per annum, while experienced professionals can earn upwards of INR 15 lakhs per annum.

Career Path

In the field of PySpark, a typical career progression may look like this: 1. Junior Developer 2. Data Engineer 3. Senior Developer 4. Tech Lead 5. Data Architect

Related Skills

In addition to PySpark, professionals in this field are often expected to have or develop skills in: - Python programming - Apache Spark - Big data technologies (Hadoop, Hive, etc.) - SQL - Data visualization tools (Tableau, Power BI)

Interview Questions

Here are 25 interview questions you may encounter when applying for PySpark roles:

  • Explain what PySpark is and its main features (basic)
  • What are the advantages of using PySpark over other big data processing frameworks? (medium)
  • How do you handle missing or null values in PySpark? (medium)
  • What is RDD in PySpark? (basic)
  • What is a DataFrame in PySpark and how is it different from an RDD? (medium)
  • How can you optimize performance in PySpark jobs? (advanced)
  • Explain the difference between map and flatMap transformations in PySpark (basic)
  • What is the role of a SparkContext in PySpark? (basic)
  • How do you handle schema inference in PySpark? (medium)
  • What is a SparkSession in PySpark? (basic)
  • How do you join DataFrames in PySpark? (medium)
  • Explain the concept of partitioning in PySpark (medium)
  • What is a UDF in PySpark? (medium)
  • How do you cache DataFrames in PySpark for optimization? (medium)
  • Explain the concept of lazy evaluation in PySpark (medium)
  • How do you handle skewed data in PySpark? (advanced)
  • What is checkpointing in PySpark and how does it help in fault tolerance? (advanced)
  • How do you tune the performance of a PySpark application? (advanced)
  • Explain the use of Accumulators in PySpark (advanced)
  • How do you handle broadcast variables in PySpark? (advanced)
  • What are the different data sources supported by PySpark? (medium)
  • How can you run PySpark on a cluster? (medium)
  • What is the purpose of the PySpark MLlib library? (medium)
  • How do you handle serialization and deserialization in PySpark? (advanced)
  • What are the best practices for deploying PySpark applications in production? (advanced)

Closing Remark

As you explore PySpark jobs in India, remember to prepare thoroughly for interviews and showcase your expertise confidently. With the right skills and knowledge, you can excel in this field and advance your career in the world of big data and analytics. Good luck!

cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies