Home
Jobs

4025 Pyspark Jobs - Page 26

Filter Interviews
Min: 0 years
Max: 25 years
Min: ₹0
Max: ₹10000000
Setup a job Alert
Filter
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

6.0 - 10.0 years

14 - 24 Lacs

Hyderabad

Work from Office

Naukri logo

Role & responsibilities Job Title: Data Engineer Years of experience: 6 to 10 years (Minimum 5 years of relevant experience) Work Mode: Work From Office Hyderabad Notice Period-Immediate to 30 Days only Key Skills: Python, SQL, AWS, Spark, Databricks - ( Mandate) Airflow- Good to have

Posted 6 days ago

Apply

2.0 - 10.0 years

0 Lacs

Noida, Uttar Pradesh, India

On-site

Linkedin logo

About Veersa - Veersa Technologies is a US-based IT services and AI enablement company founded in 2020, with a global delivery center in Noida (Sector 142). Founded by industry leaders with an impressive 85% YoY growth A profitable company since inception Team strength: Almost 400 professionals and growing rapidly Our Services Include Digital & Software Solutions: Product Development, Legacy Modernization, Support Data Engineering & AI Analytics: Predictive Analytics, AI/ML Use Cases, Data Visualization Tools & Accelerators: AI/ML-embedded tools that integrate with client systems Tech Portfolio Assessment: TCO analysis, modernization roadmaps, etc. Tech Stack - * AI/ML, IoT, Blockchain, MEAN/MERN stack, Python, GoLang, RoR, Java Spring Boot, Node.js Databases: PostgreSQL, MySQL, MS SQL, Oracle Cloud: AWS & Azure (Serverless Architecture) Website: https://veersatech.com LinkedIn: Feel free to explore our company profile About The Role We are seeking a highly skilled and experienced Data Engineer & Lead Data Engineer to join our growing data team. This role is ideal for professionals with 2 to 10 years of experience in data engineering, with a strong foundation in SQL, Databricks, Spark SQL, PySpark, and BI tools like Power BI or Tableau. As a Data Engineer, you will be responsible for building scalable data pipelines, optimizing data processing workflows, and enabling insightful reporting and analytics across the organization. Key Responsibilities Design and develop robust, scalable data pipelines using PySpark and Databricks. Write efficient SQL and Spark SQL queries for data transformation and analysis. Work closely with BI teams to enable reporting through Power BI or Tableau. Optimize performance of big data workflows and ensure data quality. Collaborate with business and technical stakeholders to gather and translate data requirements. Implement best practices for data integration, processing, and governance. Required Qualifications Bachelor’s degree in Computer Science, Engineering, or a related field. 2–10 years of experience in data engineering or a similar role. Strong experience with SQL, Spark SQL, and PySpark. Hands-on experience with Databricks for big data processing. Proven experience with BI tools such as Power BI and/or Tableau. Strong understanding of data warehousing and ETL/ELT concepts. Good problem-solving skills and the ability to work in cross-functional teams. Nice To Have Experience with cloud data platforms (Azure, AWS, or GCP). Familiarity with CI/CD pipelines and version control tools (e.g., Git). Understanding of data governance, security, and compliance standards. Exposure to data lake architectures and real-time streaming data pipelines. Show more Show less

Posted 6 days ago

Apply

2.0 - 10.0 years

0 Lacs

Ghaziabad, Uttar Pradesh, India

On-site

Linkedin logo

About Veersa - Veersa Technologies is a US-based IT services and AI enablement company founded in 2020, with a global delivery center in Noida (Sector 142). Founded by industry leaders with an impressive 85% YoY growth A profitable company since inception Team strength: Almost 400 professionals and growing rapidly Our Services Include Digital & Software Solutions: Product Development, Legacy Modernization, Support Data Engineering & AI Analytics: Predictive Analytics, AI/ML Use Cases, Data Visualization Tools & Accelerators: AI/ML-embedded tools that integrate with client systems Tech Portfolio Assessment: TCO analysis, modernization roadmaps, etc. Tech Stack - * AI/ML, IoT, Blockchain, MEAN/MERN stack, Python, GoLang, RoR, Java Spring Boot, Node.js Databases: PostgreSQL, MySQL, MS SQL, Oracle Cloud: AWS & Azure (Serverless Architecture) Website: https://veersatech.com LinkedIn: Feel free to explore our company profile About The Role We are seeking a highly skilled and experienced Data Engineer & Lead Data Engineer to join our growing data team. This role is ideal for professionals with 2 to 10 years of experience in data engineering, with a strong foundation in SQL, Databricks, Spark SQL, PySpark, and BI tools like Power BI or Tableau. As a Data Engineer, you will be responsible for building scalable data pipelines, optimizing data processing workflows, and enabling insightful reporting and analytics across the organization. Key Responsibilities Design and develop robust, scalable data pipelines using PySpark and Databricks. Write efficient SQL and Spark SQL queries for data transformation and analysis. Work closely with BI teams to enable reporting through Power BI or Tableau. Optimize performance of big data workflows and ensure data quality. Collaborate with business and technical stakeholders to gather and translate data requirements. Implement best practices for data integration, processing, and governance. Required Qualifications Bachelor’s degree in Computer Science, Engineering, or a related field. 2–10 years of experience in data engineering or a similar role. Strong experience with SQL, Spark SQL, and PySpark. Hands-on experience with Databricks for big data processing. Proven experience with BI tools such as Power BI and/or Tableau. Strong understanding of data warehousing and ETL/ELT concepts. Good problem-solving skills and the ability to work in cross-functional teams. Nice To Have Experience with cloud data platforms (Azure, AWS, or GCP). Familiarity with CI/CD pipelines and version control tools (e.g., Git). Understanding of data governance, security, and compliance standards. Exposure to data lake architectures and real-time streaming data pipelines. Show more Show less

Posted 6 days ago

Apply

2.0 - 10.0 years

0 Lacs

Delhi, India

On-site

Linkedin logo

About Veersa - Veersa Technologies is a US-based IT services and AI enablement company founded in 2020, with a global delivery center in Noida (Sector 142). Founded by industry leaders with an impressive 85% YoY growth A profitable company since inception Team strength: Almost 400 professionals and growing rapidly Our Services Include Digital & Software Solutions: Product Development, Legacy Modernization, Support Data Engineering & AI Analytics: Predictive Analytics, AI/ML Use Cases, Data Visualization Tools & Accelerators: AI/ML-embedded tools that integrate with client systems Tech Portfolio Assessment: TCO analysis, modernization roadmaps, etc. Tech Stack - * AI/ML, IoT, Blockchain, MEAN/MERN stack, Python, GoLang, RoR, Java Spring Boot, Node.js Databases: PostgreSQL, MySQL, MS SQL, Oracle Cloud: AWS & Azure (Serverless Architecture) Website: https://veersatech.com LinkedIn: Feel free to explore our company profile About The Role We are seeking a highly skilled and experienced Data Engineer & Lead Data Engineer to join our growing data team. This role is ideal for professionals with 2 to 10 years of experience in data engineering, with a strong foundation in SQL, Databricks, Spark SQL, PySpark, and BI tools like Power BI or Tableau. As a Data Engineer, you will be responsible for building scalable data pipelines, optimizing data processing workflows, and enabling insightful reporting and analytics across the organization. Key Responsibilities Design and develop robust, scalable data pipelines using PySpark and Databricks. Write efficient SQL and Spark SQL queries for data transformation and analysis. Work closely with BI teams to enable reporting through Power BI or Tableau. Optimize performance of big data workflows and ensure data quality. Collaborate with business and technical stakeholders to gather and translate data requirements. Implement best practices for data integration, processing, and governance. Required Qualifications Bachelor’s degree in Computer Science, Engineering, or a related field. 2–10 years of experience in data engineering or a similar role. Strong experience with SQL, Spark SQL, and PySpark. Hands-on experience with Databricks for big data processing. Proven experience with BI tools such as Power BI and/or Tableau. Strong understanding of data warehousing and ETL/ELT concepts. Good problem-solving skills and the ability to work in cross-functional teams. Nice To Have Experience with cloud data platforms (Azure, AWS, or GCP). Familiarity with CI/CD pipelines and version control tools (e.g., Git). Understanding of data governance, security, and compliance standards. Exposure to data lake architectures and real-time streaming data pipelines. Show more Show less

Posted 6 days ago

Apply

0.0 - 2.0 years

0 Lacs

Pune, Maharashtra, India

On-site

Linkedin logo

The Data Analytics Analyst 2 is a developing professional role. Applies specialty area knowledge in monitoring, assessing, analyzing and/or evaluating processes and data. Identifies policy gaps and formulates policies. Interprets data and makes recommendations. Researches and interprets factual information. Identifies inconsistencies in data or results, defines business issues and formulates recommendations on policies, procedures or practices. Integrates established disciplinary knowledge within own specialty area with basic understanding of related industry practices. Good understanding of how the team interacts with others in accomplishing the objectives of the area. Develops working knowledge of industry practices and standards. Limited but direct impact on the business through the quality of the tasks/services provided. Impact of the job holder is restricted to own team. Responsibilities: Identifies policy gaps and formulates policies. Interprets data and make recommendations. Integrates established disciplinary knowledge within own specialty area with basic understanding of related industry practices. Makes judgments and recommendations based on analysis and specialty area knowledge. Researches and interprets factual information. Identifies inconsistencies in data or results, define business issues and formulate recommendations on policies, procedures or practices. Exchanges information in a concise and logical way as well as be sensitive to audience diversity. Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency. Qualifications: 0-2 years experience using tools for statistical modeling of large data sets Education: Bachelor’s/University degree or equivalent experience This job description provides a high-level review of the types of work performed. Other job-related duties may be assigned as required. Experience as a python developer with expertise in automation testing to design, develop, and automate robust software solutions and testing frameworks like Pytest, Behave etc. 2-4 years of experience as Big Data Engineer to develop, optimize, and manage large-scale data processing systems and analytics platforms. 3-4 years of experience in distributed data processing & near real-time data analytics using PySpark. Familiarity with CI/CD pipelines, version control systems (e.g., Git), and DevOps practices ------------------------------------------------------ Job Family Group: Technology ------------------------------------------------------ Job Family: Data Analytics ------------------------------------------------------ Time Type: Full time ------------------------------------------------------ Most Relevant Skills Please see the requirements listed above. ------------------------------------------------------ Other Relevant Skills For complementary skills, please see above and/or contact the recruiter. ------------------------------------------------------ Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law. If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi. View Citi’s EEO Policy Statement and the Know Your Rights poster. Show more Show less

Posted 6 days ago

Apply

1.0 - 3.0 years

7 - 10 Lacs

Pune

Hybrid

Naukri logo

Role & responsibilities Tasks Develop and maintain data pipelines under senior guidance. Support data integration and basic analysis tasks. Work with Databricks (SQL, PySpark) and AWS services. Contribute to Agile and TDD-based development practices. Good Python programming skills. Familiarity with Databricks (SQL, PySpark) and AWS basics (S3, EC2). Agile team experience. Preferred candidate profile Good to Have • Exposure to ETL processes and data management. • Experience with Databricks Workflows. • AWS/Databricks certifications (nice to have). Software Skills • Python • Databricks (SQL, PySpark) • AWS basics (S3, EC2) • JIRA, Git

Posted 6 days ago

Apply

12.0 - 18.0 years

0 Lacs

Tamil Nadu, India

Remote

Linkedin logo

Join us as we work to create a thriving ecosystem that delivers accessible, high-quality, and sustainable healthcare for all. This position requires expertise in designing, developing, debugging, and maintaining AI-powered applications and data engineering workflows for both local and cloud environments. The role involves working on large-scale projects, optimizing AI/ML pipelines, and ensuring scalable data infrastructure. As a PMTS, you will be responsible for integrating Generative AI (GenAI) capabilities, building data pipelines for AI model training, and deploying scalable AI-powered microservices. You will collaborate with AI/ML, Data Engineering, DevOps, and Product teams to deliver impactful solutions that enhance our products and services. Additionally, it would be desirable if the candidate has experience in retrieval-augmented generation (RAG), fine-tuning pre-trained LLMs, AI model evaluation, data pipeline automation, and optimizing cloud-based AI deployments. Responsibilities AI-Powered Software Development & API Integration Develop AI-driven applications, microservices, and automation workflows using FastAPI, Flask, or Django, ensuring cloud-native deployment and performance optimization. Integrate OpenAI APIs (GPT models, Embeddings, Function Calling) and Retrieval-Augmented Generation (RAG) techniques to enhance AI-powered document retrieval, classification, and decision-making. Data Engineering & AI Model Performance Optimization Design, build, and optimize scalable data pipelines for AI/ML workflows using Pandas, PySpark, and Dask, integrating data sources such as Kafka, AWS S3, Azure Data Lake, and Snowflake. Enhance AI model inference efficiency by implementing vector retrieval using FAISS, Pinecone, or ChromaDB, and optimize API latency with tuning techniques (temperature, top-k sampling, max tokens settings). Microservices, APIs & Security Develop scalable RESTful APIs for AI models and data services, ensuring integration with internal and external systems while securing API endpoints using OAuth, JWT, and API Key Authentication. Implement AI-powered logging, observability, and monitoring to track data pipelines, model drift, and inference accuracy, ensuring compliance with AI governance and security best practices. AI & Data Engineering Collaboration Work with AI/ML, Data Engineering, and DevOps teams to optimize AI model deployments, data pipelines, and real-time/batch processing for AI-driven solutions. Engage in Agile ceremonies, backlog refinement, and collaborative problem-solving to scale AI-powered workflows in areas like fraud detection, claims processing, and intelligent automation. Cross-Functional Coordination and Communication Collaborate with Product, UX, and Compliance teams to align AI-powered features with user needs, security policies, and regulatory frameworks (HIPAA, GDPR, SOC2). Ensure seamless integration of structured and unstructured data sources (SQL, NoSQL, vector databases) to improve AI model accuracy and retrieval efficiency. Mentorship & Knowledge Sharing Mentor junior engineers on AI model integration, API development, and scalable data engineering best practices, and conduct knowledge-sharing sessions. Education & Experience Required 12-18 years of experience in software engineering or AI/ML development, preferably in AI-driven solutions. Hands-on experience with Agile development, SDLC, CI/CD pipelines, and AI model deployment lifecycles. Bachelor’s Degree or equivalent in Computer Science, Engineering, Data Science, or a related field. Proficiency in full-stack development with expertise in Python (preferred for AI), Java Experience with structured & unstructured data: SQL (PostgreSQL, MySQL, SQL Server) NoSQL (OpenSearch, Redis, Elasticsearch) Vector Databases (FAISS, Pinecone, ChromaDB) Cloud & AI Infrastructure AWS: Lambda, SageMaker, ECS, S3 Azure: Azure OpenAI, ML Studio GenAI Frameworks & Tools: OpenAI API, Hugging Face Transformers, LangChain, LlamaIndex, AutoGPT, CrewAI. Experience in LLM deployment, retrieval-augmented generation (RAG), and AI search optimization. Proficiency in AI model evaluation (BLEU, ROUGE, BERT Score, cosine similarity) and responsible AI deployment. Strong problem-solving skills, AI ethics awareness, and the ability to collaborate across AI, DevOps, and data engineering teams. Curiosity and eagerness to explore new AI models, tools, and best practices for scalable GenAI adoption. About Athenahealth Here’s our vision: To create a thriving ecosystem that delivers accessible, high-quality, and sustainable healthcare for all. What’s unique about our locations? From an historic, 19th century arsenal to a converted, landmark power plant, all of athenahealth’s offices were carefully chosen to represent our innovative spirit and promote the most positive and productive work environment for our teams. Our 10 offices across the United States and India — plus numerous remote employees — all work to modernize the healthcare experience, together. Our Company Culture Might Be Our Best Feature. We don't take ourselves too seriously. But our work? That’s another story. athenahealth develops and implements products and services that support US healthcare: It’s our chance to create healthier futures for ourselves, for our family and friends, for everyone. Our vibrant and talented employees — or athenistas, as we call ourselves — spark the innovation and passion needed to accomplish our goal. We continue to expand our workforce with amazing people who bring diverse backgrounds, experiences, and perspectives at every level, and foster an environment where every athenista feels comfortable bringing their best selves to work. Our size makes a difference, too: We are small enough that your individual contributions will stand out — but large enough to grow your career with our resources and established business stability. Giving back is integral to our culture. Our athenaGives platform strives to support food security, expand access to high-quality healthcare for all, and support STEM education to develop providers and technologists who will provide access to high-quality healthcare for all in the future. As part of the evolution of athenahealth’s Corporate Social Responsibility (CSR) program, we’ve selected nonprofit partners that align with our purpose and let us foster long-term partnerships for charitable giving, employee volunteerism, insight sharing, collaboration, and cross-team engagement. What can we do for you? Along with health and financial benefits, athenistas enjoy perks specific to each location, including commuter support, employee assistance programs, tuition assistance, employee resource groups, and collaborative workspaces — some offices even welcome dogs. In addition to our traditional benefits and perks, we sponsor events throughout the year, including book clubs, external speakers, and hackathons. And we provide athenistas with a company culture based on learning, the support of an engaged team, and an inclusive environment where all employees are valued. We also encourage a better work-life balance for athenistas with our flexibility. While we know in-office collaboration is critical to our vision, we recognize that not all work needs to be done within an office environment, full-time. With consistent communication and digital collaboration tools, athenahealth enables employees to find a balance that feels fulfilling and productive for each individual situation. Show more Show less

Posted 6 days ago

Apply

3.0 years

0 Lacs

Ahmedabad, Gujarat, India

Remote

Linkedin logo

Location: Remote/Hybrid (India-based preferred) Type: Full-Time Must Haves (Don’t Apply If You Miss Any) 3+ years experience in Data Engineering Proven hands-on with ETL pipelines (end-to-end ownership) AWS Resources: Deep experience with EC2, Athena, Lambda, Step Functions (non-negotiable; critical to the role) Strong with MySQL (not negotiable) Docker (setup, deployment, troubleshooting) Good To Have (Adds Major Value) Airflow or any modern orchestration tool PySpark experience Python Ecosystem SQL Alchemy DuckDB PyArrow Pandas Numpy DLT (Data Load Tool). About You You’re a builder, not just a maintainer. You can work independently but communicate crisply. You thrive in fast-moving, startup environments. You care about ownership and impact, not just code. Include the Code word Red Panda in your message application, so that we know you have read this section. What You’ll Do Architect, build, and optimize robust data pipelines and workflows Own AWS resource configuration, optimization, and troubleshooting Collaborate with product and engineering teams to deliver business impact fast Automate and scale data processes—no manual work culture Shape the data foundation for real business decisions Cut to the chase. Only serious, relevant applicants will be considered. Show more Show less

Posted 6 days ago

Apply

8.0 years

0 Lacs

Hyderabad, Telangana, India

On-site

Linkedin logo

TJX Companies At TJX Companies, every day brings new opportunities for growth, exploration, and achievement. You’ll be part of our vibrant team that embraces diversity, fosters collaboration, and prioritizes your development. Whether you’re working in our four global Home Offices, Distribution Centers or Retail Stores—TJ Maxx, Marshalls, Homegoods, Homesense, Sierra, Winners, and TK Maxx, you’ll find abundant opportunities to learn, thrive, and make an impact. Come join our TJX family—a Fortune 100 company and the world’s leading off-price retailer. Job Description About TJX: At TJX, is a Fortune 100 company that operates off-price retailers of apparel and home fashions. TJX India - Hyderabad is the IT home office in the global technology organization of off-price apparel and home fashion retailer TJX, established to deliver innovative solutions that help transform operations globally. At TJX, we strive to build a workplace where our Associates’ contributions are welcomed and are embedded in our purpose to provide excellent value to our customers every day. At TJX India, we take a long-term view of your career. We have a high-performance culture that rewards Associates with career growth opportunities, preferred assignments, and upward career advancement. We take well-being very seriously and are committed to offering a great work-life balance for all our Associates. What You’ll Discover Inclusive culture and career growth opportunities A truly Global IT Organization that collaborates across North America, Europe, Asia and Australia, click here to learn more Challenging, collaborative, and team-based environment What You’ll Do The Global Supply Chain - Logistics Team is responsible for managing various supply chain logistics related solutions within TJX IT. The organization delivers capabilities that enrich the customer experience and provide business value. We seek a motivated, talented Staff Engineer with good understanding of cloud base, database and BI concepts to help architect enterprise reporting solutions across global buying, planning and allocations. What You’ll Need The Global Supply Chain - Logistics Team thrives on strong relationships with our business partners and working diligently to address their needs which supports TJX growth and operational stability. On this tightly knit and fast-paced solution delivery team you will be constantly challenged to stretch and think outside the box. You will be working with product teams, architecture and business partners to strategically plan and deliver the product features by connecting the technical and business worlds. You will need to break down complex problems into steps that drive product development while keeping product quality and security as the priority. You will be responsible for most architecture, design and technical decisions within the assigned scope. Key Responsibilities Design, develop, test and deploy AI solutions using Azure AI services to meet business requirements, working collaboratively with architects and other engineers. Train, fine-tune, and evaluate AI models, including large language models (LLMs), ensuring they meet performance criteria and integrate seamlessly into new or existing solutions. Develop and integrate APIs to enable smooth interaction between AI models and other applications, facilitating efficient model serving. Collaborate effectively with cross-functional teams, including data scientists, software engineers, and business stakeholders, to deliver comprehensive AI solutions. Optimize AI and ML model performance through techniques such as hyperparameter tuning and model compression to enhance efficiency and effectiveness. Monitor and maintain AI systems, providing technical support and troubleshooting to ensure continuous operation and reliability. Create comprehensive documentation for AI solutions, including design documents, user guides, and operational procedures, to support development and maintenance. Stay updated with the latest advancements in AI, machine learning, and cloud technologies, demonstrating a commitment to continuous learning and improvement. Design, code, deploy, and support software components, working collaboratively with AI architects and engineers to build impactful systems and services. Lead medium complex initiatives, prioritizing and assigning tasks, providing guidance, and resolving issues to ensure successful project delivery. Minimum Qualifications Bachelor's degree in computer science, engineering, or related field 8+ years of experience in data/software engineering, design, implementation and architecture. At least 5+ years of hands-on experience in developing AI/ML solutions, with a focus on deploying them in a cloud environment. Deep understanding of AI and ML algorithms with focus on Operations Research / Optimization knowledge (preferably Metaheuristics / Genetic Algorithms). Strong programming skills in Python with advanced OOPS concepts. Good understanding of structured, semi structured, and unstructured data, Data modelling, Data analysis, ETL and ELT. Proficiency with Databricks & PySpark. Experience with MLOps practices including CI/CD for machine learning models. Knowledge of security best practices for deploying AI solutions, including data encryption and access control. Knowledge of ethical considerations in AI, including bias detection and mitigation strategies. This role operates in an Agile/Scrum environment and requires a solid understanding of the full software lifecycle, including functional requirement gathering, design and development, testing of software applications, and documenting requirements and technical specifications. Fully Owns Epics with decreasing guidance. Takes initiative through identifying gaps and opportunities. Strong communication and influence skills. Solid team leadership with mentorship skills Ability to understand the work environment and competing priorities in conjunction with developing/meeting project goals. Shows a positive, open-minded, and can-do attitude. Experience In The Following Technologies Advanced Python programming (OOPS) Operations Research / Optimization knowledge (preferably Metaheuristics / Genetic Algorithms) Databricks with Pyspark Azure / Cloud knowledge Github / version control Functional knowledge on Supply Chain / Logistics is preferred. In addition to our open door policy and supportive work environment, we also strive to provide a competitive salary and benefits package. TJX considers all applicants for employment without regard to race, color, religion, gender, sexual orientation, national origin, age, disability, gender identity and expression, marital or military status, or based on any individual's status in any group or class protected by applicable federal, state, or local law. TJX also provides reasonable accommodations to qualified individuals with disabilities in accordance with the Americans with Disabilities Act and applicable state and local law. Address Salarpuria Sattva Knowledge City, Inorbit Road Location: APAC Home Office Hyderabad IN Show more Show less

Posted 6 days ago

Apply

25.0 years

0 Lacs

Kochi, Kerala, India

On-site

Linkedin logo

Company Overview Milestone Technologies is a global IT managed services firm that partners with organizations to scale their technology, infrastructure and services to drive specific business outcomes such as digital transformation, innovation, and operational agility. Milestone is focused on building an employee-first, performance-based culture and for over 25 years, we have a demonstrated history of supporting category-defining enterprise clients that are growing ahead of the market. The company specializes in providing solutions across Application Services and Consulting, Digital Product Engineering, Digital Workplace Services, Private Cloud Services, AI/Automation, and ServiceNow. Milestone culture is built to provide a collaborative, inclusive environment that supports employees and empowers them to reach their full potential. Our seasoned professionals deliver services based on Milestone’s best practices and service delivery framework. By leveraging our vast knowledge base to execute initiatives, we deliver both short-term and long-term value to our clients and apply continuous service improvement to deliver transformational benefits to IT. With Intelligent Automation, Milestone helps businesses further accelerate their IT transformation. The result is a sharper focus on business objectives and a dramatic improvement in employee productivity. Through our key technology partnerships and our people-first approach, Milestone continues to deliver industry-leading innovation to our clients. With more than 3,000 employees serving over 200 companies worldwide, we are following our mission of revolutionizing the way IT is deployed. Job Overview In this vital role you will be responsible for the development and implementation of our data strategy. The ideal candidate possesses a strong blend of technical expertise and data-driven problem-solving skills. As a Data Engineer, you will play a crucial role in building, and optimizing our data pipelines and platforms in a SAFE Agile product team. Chip in to the design, development, and implementation of data pipelines, ETL/ELT processes, and data integration solutions. Deliver for data pipeline projects from development to deployment, managing, timelines, and risks. Ensure data quality and integrity through meticulous testing and monitoring. Leverage cloud platforms (AWS, Databricks) to build scalable and efficient data solutions. Work closely with product team, and key collaborators to understand data requirements. Enforce to data engineering industry standards and standards. Experience developing in an Agile development environment, and comfortable with Agile terminology and ceremonies. Familiarity with code versioning using GIT and code migration tools. Familiarity with JIRA. Stay up to date with the latest data technologies and trends Basic Qualifications What we expect of you Doctorate degree OR Master’s degree and 4 to 6 years of Information Systems experience OR Bachelor’s degree and 6 to 8 years of Information Systems experience OR Diploma and 10 to 12 years of Information Systems experience. Demonstrated hands-on experience with cloud platforms (AWS, Azure, GCP) Proficiency in Python, PySpark, SQL. Development knowledge in Databricks. Good analytical and problem-solving skills to address sophisticated data challenges. Preferred Qualifications Experienced with data modeling Experienced working with ETL orchestration technologies Experienced with software engineering best-practices, including but not limited to version control (Git, Subversion, etc.), CI/CD (Jenkins, Maven etc.), automated unit testing, and DevOps Familiarity with SQL/NOSQL database Soft Skills Skilled in breaking down problems, documenting problem statements, and estimating efforts. Effective communication and interpersonal skills to collaborate with multi-functional teams. Excellent analytical and problem solving skills. Strong verbal and written communication skills Ability to work successfully with global teams High degree of initiative and self-motivation. Team-oriented, with a focus on achieving team goals Compensation Estimated Pay Range: Exact compensation and offers of employment are dependent on circumstances of each case and will be determined based on job-related knowledge, skills, experience, licenses or certifications, and location. Our Commitment to Diversity & Inclusion At Milestone we strive to create a workplace that reflects the communities we serve and work with, where we all feel empowered to bring our full, authentic selves to work. We know creating a diverse and inclusive culture that champions equity and belonging is not only the right thing to do for our employees but is also critical to our continued success. Milestone Technologies provides equal employment opportunity for all applicants and employees. All qualified applicants will receive consideration for employment and will not be discriminated against on the basis of race, color, religion, gender, gender identity, marital status, age, disability, veteran status, sexual orientation, national origin, or any other category protected by applicable federal and state law, or local ordinance. Milestone also makes reasonable accommodations for disabled applicants and employees. We welcome the unique background, culture, experiences, knowledge, innovation, self-expression and perspectives you can bring to our global community. Our recruitment team is looking forward to meeting you. Show more Show less

Posted 6 days ago

Apply

3.0 years

0 Lacs

Hyderabad, Telangana, India

On-site

Linkedin logo

Company Overview Viraaj HR Solutions is a leading staffing firm dedicated to connecting talented individuals with exceptional organizations. We focus on delivering innovative and effective workforce solutions tailored to meet the unique needs of our clients. As a company, we prioritize collaboration, integrity, and excellence, ensuring that we create the best opportunities for both our candidates and clients. Job Title: GCP Data Engineer Location: India (On-site) Role Responsibilities Design, develop, and maintain robust data pipelines on Google Cloud Platform (GCP). Implement ETL processes to extract data from various sources. Optimize data storage solutions and ensure proper data architecture. Collaborate with data scientists and analysts to provide insights through analytics. Monitor and troubleshoot data issues for continuous improvement. Ensure data quality and governance throughout the data lifecycle. Participate in the architecture and schema design of data lakes and warehouses. Create and manage BigQuery datasets, tables, and views. Automate repetitive data tasks to streamline workflows. Document data processes and protocols for future reference. Conduct regular performance tuning of data pipelines and storage. Participate in code reviews and provide support to junior team members. Stay updated on the latest GCP features and industry trends. Work closely with stakeholders to understand data requirements. Provide technical support and guidance on data-related projects. Qualifications Bachelor's degree in Computer Science, Information Technology, or a related field. Minimum 3 years of experience in a data engineering role. Strong understanding of Google Cloud Platform services. Experience with SQL and NoSQL databases. Proficient in programming languages such as Python. Familiarity with data modeling techniques. Hands-on experience with ETL tools and frameworks. Strong analytical and problem-solving skills. Ability to work in a fast-paced environment and meet deadlines. Excellent verbal and written communication skills. Experience with data visualization tools is a plus. Knowledge of data governance frameworks. Understanding of agile methodologies. Ability to work collaboratively in a team environment. Willingness to continuously learn and adapt to new technologies. Certification in GCP or relevant technologies is preferred. Skills: nosql databases,cloud storage,data modeling,sql,gcp,data engineer,data architecture,sql proficiency,google cloud platform (gcp),data visualization tools,etl processes,data governance,bigquery,python,pyspark,agile methodologies Show more Show less

Posted 6 days ago

Apply

5.0 - 10.0 years

7 - 17 Lacs

Hyderabad, Pune, Chennai

Work from Office

Naukri logo

Airflow Data Engineer in AWS platform Job Title Apache Airflow Data Engineer ROLE” as per TCS Role Master • 4-8 years of experience in AWS, Apache Airflow (on Astronomer platform), Python, Pyspark, SQL • Good hands-on knowledge on SQL and Data Warehousing life cycle is an absolute requirement. • Experience in creating data pipelines and orchestrating using Apache Airflow • Significant experience with data migrations and development of Operational Data Stores, Enterprise Data Warehouses, Data Lake and Data Marts. • Good to have: Experience with cloud ETL and ELT in one of the tools like DBT/Glue/EMR or Matillion or any other ELT tool • Excellent communication skills to liaise with Business & IT stakeholders. • Expertise in planning execution of a project and efforts estimation. • Exposure to working in Agile ways of working. Candidate for this position to be offered with TAIC or TCSL as Entity Data warehousing , pyspark , Github, AWS data platform, Glue, EMR, RedShift, databricks,Data Marts. DBT/Glue/EMR or Matillion, data engineering, data modelling, data consumption

Posted 6 days ago

Apply

3.0 - 8.0 years

15 - 20 Lacs

Chennai, Sholinganallur

Hybrid

Naukri logo

Position Description:Bachelors Degree 2+Years in GCP Services - Biq Query, Data Flow, Dataproc, DataPlex,DataFusion, Terraform, Tekton, Cloud SQL, Redis Memory, Airflow, Cloud Storage 2+ Years inData Transfer Utilities 2+ Years in Git / any other version control tool 2+ Years in Confluent Kafka1+ Years of Experience in API Development 2+ Years in Agile Framework 4+ years of strongexperience in python, Pyspark development. 4+ years of shell scripting to develop the adhoc jobsfor data importing/exportingSkills Required:Google Cloud Platform - Biq Query, Data Flow, Dataproc, Data Fusion, TERRAFORM, Tekton,Cloud SQL, AIRFLOW, POSTGRES, Airflow PySpark, Python, API

Posted 6 days ago

Apply

5.0 - 8.0 years

0 Lacs

Indore, Hyderabad, Pune

Work from Office

Naukri logo

SN Required Information Details 1 Role Digital Python : Python Developer for AWS platform 2 Required Technical Skill Set Implement features, sub-components & services leveraging AWS services where possible; Data parsing and processing using Python Test, Troubleshoot, bug fix, deploy. Desired Experience Range 5 10 years Location of Requirement Pune (Sahyadri Park), Hyderabad (Adibatla), Indore Desired Competencies (Technical/Behavioral Competency) Must-Have Hands-on experience in Python and AWS services Good-to-Have S3, IAM, Lambda, Cloud formation SN Responsibility of / Expectations from the Role 1 Should be able to interact with the business and understand the requirements 2 Build technological capabilities and mentor team 3 Experience with Agile methodologies - Scrum, Continuous integration 4 Attention to detail, Desire and ability to work in a multi-distributed team environment 5 Ability to excel in a short timeframe under short sprints 6 Strong communication and documentation skills

Posted 6 days ago

Apply

3.0 years

0 Lacs

Delhi, India

On-site

Linkedin logo

Company Overview Viraaj HR Solutions is a leading staffing firm dedicated to connecting talented individuals with exceptional organizations. We focus on delivering innovative and effective workforce solutions tailored to meet the unique needs of our clients. As a company, we prioritize collaboration, integrity, and excellence, ensuring that we create the best opportunities for both our candidates and clients. Job Title: GCP Data Engineer Location: India (On-site) Role Responsibilities Design, develop, and maintain robust data pipelines on Google Cloud Platform (GCP). Implement ETL processes to extract data from various sources. Optimize data storage solutions and ensure proper data architecture. Collaborate with data scientists and analysts to provide insights through analytics. Monitor and troubleshoot data issues for continuous improvement. Ensure data quality and governance throughout the data lifecycle. Participate in the architecture and schema design of data lakes and warehouses. Create and manage BigQuery datasets, tables, and views. Automate repetitive data tasks to streamline workflows. Document data processes and protocols for future reference. Conduct regular performance tuning of data pipelines and storage. Participate in code reviews and provide support to junior team members. Stay updated on the latest GCP features and industry trends. Work closely with stakeholders to understand data requirements. Provide technical support and guidance on data-related projects. Qualifications Bachelor's degree in Computer Science, Information Technology, or a related field. Minimum 3 years of experience in a data engineering role. Strong understanding of Google Cloud Platform services. Experience with SQL and NoSQL databases. Proficient in programming languages such as Python. Familiarity with data modeling techniques. Hands-on experience with ETL tools and frameworks. Strong analytical and problem-solving skills. Ability to work in a fast-paced environment and meet deadlines. Excellent verbal and written communication skills. Experience with data visualization tools is a plus. Knowledge of data governance frameworks. Understanding of agile methodologies. Ability to work collaboratively in a team environment. Willingness to continuously learn and adapt to new technologies. Certification in GCP or relevant technologies is preferred. Skills: nosql databases,cloud storage,data modeling,sql,gcp,data engineer,data architecture,sql proficiency,google cloud platform (gcp),data visualization tools,etl processes,data governance,bigquery,python,pyspark,agile methodologies Show more Show less

Posted 6 days ago

Apply

5.0 - 10.0 years

6 - 16 Lacs

Pune, Chennai

Hybrid

Naukri logo

Job Title: Python Developer Duration: Full time role Location: Pune/Chennai (Hybrid) Note: At least 5 years of experience is required + strong expertise in Any cloud(AWS/Azure/GCP). Job Description: As a Solutions Integration Engineer on the Professional Services team, you will play a critical role in designing, developing, and deploying integrations in Clients app marketplace. This role is highly technical and cross-functional, working closely with Product, Engineering, Solutions Engineering, and Business Operations teams. Youll be responsible for building cloud-hosted solutions using Clients APIs and webhooks, preferably leveraging serverless architectures on platforms like AWS. You will develop deep expertise in B2B integrations, lead technical implementation efforts, and ensure that integration solutions are scalable, maintainable, and aligned with customer and product needs. What You’ll Do Analyze and understand business and technical integration requirements Design and develop robust B2B integrations using Client’s APIs and webhooks Lead development activities based on requirements from Product and other stakeholders Deploy and manage solutions in cloud environments such as AWS (serverless preferred) Use AI development tools (e.g., Cursor) to improve engineering productivity Conduct testing, validation, and peer code reviews to ensure high-quality deliverables Lead by example through thorough, constructive code reviews and mentorship Collaborate closely with the Developer Ecosystem team to enhance API capabilities and developer experience Share knowledge, best practices, and feedback with internal teams and the broader developer community Contribute to the evolution of Professional Services tools, templates, and methodologies Champion and embody Client’s cultural principles: Focus on Customer Success, Build for the Long Term, Adopt a Growth Mindset, Be Inclusive, Win as a Team Minimum Qualifications: 5+ years of experience designing and developing integration solutions using proven architectural patterns. Needs strong experience in Python. Strong experience translating integration requirements into scalable technical solutions. Proficiency in multiple programming languages such as Python, Ruby, or Go. 3+ years of hands-on experience with cloud platforms (AWS, Azure, or GCP). Experience working with RESTful APIs, webhooks, and modern databases. Experience with CI/CD, Jira and GitHub workflow. Excellent written and verbal communication skills. Proven success managing multiple customer-facing projects simultaneously. Hands-on problem-solving skills and bias toward action with a focus on delivering results

Posted 6 days ago

Apply

3.0 years

0 Lacs

Pune, Maharashtra, India

On-site

Linkedin logo

Company Overview Viraaj HR Solutions is a leading staffing firm dedicated to connecting talented individuals with exceptional organizations. We focus on delivering innovative and effective workforce solutions tailored to meet the unique needs of our clients. As a company, we prioritize collaboration, integrity, and excellence, ensuring that we create the best opportunities for both our candidates and clients. Job Title: GCP Data Engineer Location: India (On-site) Role Responsibilities Design, develop, and maintain robust data pipelines on Google Cloud Platform (GCP). Implement ETL processes to extract data from various sources. Optimize data storage solutions and ensure proper data architecture. Collaborate with data scientists and analysts to provide insights through analytics. Monitor and troubleshoot data issues for continuous improvement. Ensure data quality and governance throughout the data lifecycle. Participate in the architecture and schema design of data lakes and warehouses. Create and manage BigQuery datasets, tables, and views. Automate repetitive data tasks to streamline workflows. Document data processes and protocols for future reference. Conduct regular performance tuning of data pipelines and storage. Participate in code reviews and provide support to junior team members. Stay updated on the latest GCP features and industry trends. Work closely with stakeholders to understand data requirements. Provide technical support and guidance on data-related projects. Qualifications Bachelor's degree in Computer Science, Information Technology, or a related field. Minimum 3 years of experience in a data engineering role. Strong understanding of Google Cloud Platform services. Experience with SQL and NoSQL databases. Proficient in programming languages such as Python. Familiarity with data modeling techniques. Hands-on experience with ETL tools and frameworks. Strong analytical and problem-solving skills. Ability to work in a fast-paced environment and meet deadlines. Excellent verbal and written communication skills. Experience with data visualization tools is a plus. Knowledge of data governance frameworks. Understanding of agile methodologies. Ability to work collaboratively in a team environment. Willingness to continuously learn and adapt to new technologies. Certification in GCP or relevant technologies is preferred. Skills: nosql databases,cloud storage,data modeling,sql,gcp,data engineer,data architecture,sql proficiency,google cloud platform (gcp),data visualization tools,etl processes,data governance,bigquery,python,pyspark,agile methodologies Show more Show less

Posted 6 days ago

Apply

4.0 - 8.0 years

10 - 20 Lacs

Chennai, Bengaluru

Work from Office

Naukri logo

Years of experience: 4 to 8 years (with minimum 4 years of relevant experience) Tech Stack : AWS Python SQL Pyspark

Posted 6 days ago

Apply

7.0 - 12.0 years

10 - 20 Lacs

Pune, Bengaluru, Mumbai (All Areas)

Work from Office

Naukri logo

Professional & Technical Skills: - Must To Have Skills: Proficiency in PySpark Solid grasp of data munging techniques, including data cleaning, transformation, and normalization to ensure data quality and integrity

Posted 6 days ago

Apply

12.0 - 22.0 years

30 - 45 Lacs

Chennai

Hybrid

Naukri logo

We have an urgent requirement for the role- with Leading MNC based out at chennai Location JD- Architect (Big Data) Professional & Technical Skills: - Must To Have Skills: Proficiency in Apache Spark, PySpark - Strong understanding of distributed computing and parallel processing. - Experience with data processing frameworks like Hadoop and Spark. - Hands-on experience with programming languages like Java or Scala. - Knowledge of SQL and database systems. - Familiarity with cloud platforms like AWS or Azure. - Experience with version control systems like Git. - Good To Have Skills: Experience with machine learning algorithms and libraries. - Knowledge of data streaming technologies like Kafka - Understanding of containerization technologies like Docker or Kubernetes.

Posted 6 days ago

Apply

3.0 - 8.0 years

12 - 22 Lacs

Hyderabad

Hybrid

Naukri logo

Technical Experience : Experience 5 yrs of Hands on Exp of Pyspark AWS Glue Python SQL Experience in Leading a team managing Delivery Understanding of Development LifeCycle CICD Pipelines Experience around Working with Agile Practices JIRA Experience in Designing Data Driven Solutions

Posted 6 days ago

Apply

5.0 - 10.0 years

18 - 25 Lacs

Bengaluru

Hybrid

Naukri logo

Skill required : Data Engineers- Azure Designation : Sr Analyst/ Consultant Job Location : Bengaluru Qualifications: BE/BTech Years of Experience : 4 - 11 Years OVERALL PURPOSE OF JOB Understand client requirements and build ETL solution using Azure Data Factory, Azure Databricks & PySpark . Build solution in such a way that it can absorb clients change request very easily. Find innovative ways to accomplish tasks and handle multiple projects simultaneously and independently. Works with Data & appropriate teams to effectively source required data. Identify data gaps and work with client teams to effectively communicate the findings to stakeholders/clients. Responsibilities : Develop ETL solution to populate Centralized Repository by integrating data from various data sources. Create Data Pipelines, Data Flow, Data Model according to the business requirement. Proficient in implementing all transformations according to business needs. Identify data gaps in data lake and work with relevant data/client teams to get necessary data required for dashboarding/reporting. Strong experience working on Azure data platform, Azure Data Factory, Azure Data Bricks. Strong experience working on ETL components and scripting languages like PySpark, Python . Experience in creating Pipelines, Alerts, email notifications, and scheduling jobs. Exposure on development/staging/production environments. Providing support in creating, monitoring and troubleshooting the scheduled jobs. Effectively work with client and handle client interactions. Skills Required: Bachelors' degree in Engineering or Science or equivalent graduates with at least 4-11 years of overall experience in data management including data integration, modeling & optimization. Minimum 4 years of experience working on Azure cloud, Azure Data Factory, Azure Databricks. Minimum 3-4 years of experience in PySpark, Python, etc. for data ETL . In-depth understanding of data warehouse, ETL concept and modeling principles. Strong ability to design, build and manage data. Strong understanding of Data integration. Strong Analytical and problem-solving skills. Strong Communication & client interaction skills. Ability to design database to store huge data necessary for reporting & dashboarding. Ability and willingness to acquire knowledge on the new technologies, good analytical and interpersonal skills with ability to interact with individuals at all levels.

Posted 6 days ago

Apply

3.0 - 6.0 years

6 - 16 Lacs

Pune

Work from Office

Naukri logo

Skills: Performance Testing, Databricks Pipeline Key Responsibilities: Design and execute performance testing strategies specifically for Databricks-based data pipelines. Identify performance bottlenecks and provide optimization recommendations across Spark/Databricks workloads. Collaborate with development and DevOps teams to integrate performance testing into CI/CD pipelines. Analyze job execution metrics, cluster utilization, memory/storage usage, and latency across various stages of data pipeline processing. Create and maintain performance test scripts, frameworks, and dashboards using tools like JMeter, Locust, or custom Python utilities. Generate detailed performance reports and suggest tuning at the code, configuration, and platform levels. Conduct root cause analysis for slow-running ETL/ELT jobs and recommend remediation steps. Participate in production issue resolution related to performance and contribute to RCA documentation. Technical Skills: Mandatory: Strong understanding of Databricks, Apache Spark, and performance tuning techniques for distributed data processing systems. Hands-on experience in Spark (PySpark/Scala) performance profiling, partitioning strategies, and job parallelization. 2+ years of experience in performance testing and load simulation of data pipelines. Solid skills in SQL, Snowflake, and analyzing performance via query plans and optimization hints. Familiarity with Azure Databricks, Azure Monitor, Log Analytics, or similar observability tools. Proficient in scripting (Python/Shell) for test automation and pipeline instrumentation. Experience with DevOps tools such as Azure DevOps, GitHub Actions, or Jenkins for automated testing. Comfortable working in Unix/Linux environments and writing shell scripts for monitoring and debugging. Good to Have: Experience with job schedulers like Control-M, Autosys, or Azure Data Factory trigger flows. Exposure to CI/CD integration for automated performance validation. Understanding of network/storage I/O tuning parameters in cloud-based environments.

Posted 6 days ago

Apply

12.0 - 22.0 years

30 - 45 Lacs

Chennai

Work from Office

Naukri logo

Project Role Description : Design, build and configure applications to meet business process and application requirements. Must have skills : Apache Spark Good to have skills : Google BigQuery, PySpark Professional & Technical Skills: - Must To Have Skills: Proficiency in Apache Spark, PySpark, Google BigQuery. - Strong understanding of statistical analysis and machine learning algorithms. - Experience with data visualization tools such as Tableau or Power BI. - Hands-on implementing various machine learning algorithms such as linear regression, logistic regression, decision trees, and clustering algorithms. - Solid grasp of data munging techniques, including data cleaning, transformation, and normalization to ensure data quality and integrity. Additional Information: - The candidate should have a minimum of 12 years of experience in Apache Spark.

Posted 6 days ago

Apply

5.0 - 10.0 years

0 - 0 Lacs

Gurugram, Bengaluru, Delhi / NCR

Work from Office

Naukri logo

Bachelors or higher degree in Computer Science or a related discipline; or equivalent (minimum 4 years work experience). • At least 2+ years of consulting or client service delivery experience on Azure Data Solution • At least 2+ years of experience in developing data ingestion, data processing and analytical pipelines for big data, relational databases such as SQL server and data warehouse solutions such as Azure Synapse • Extensive experience providing practical direction with using Azure Native services. • Extensive hands-on experience implementing data ingestion, ETL and data processing using Azure services: ADLS, Azure Data Factory, Azure Functions, Azure Logic App Synapse/DW, Azure SQL DB, Databricks etc. • Experience in Data Analysis, data debugging, problem solving skills and business requirement understanding. • Minimum of 2+ years of hands-on experience in Azure and Big Data technologies such as Java, Python, SQL, ADLS/Blob, PySpark and SparkSQL, Databricks, HD Insight • Well versed in DevSecOps and CI/CD deployments • Experience in using Big Data File Formats and compression techniques. • Experience working with Developer tools such as Azure DevOps, Visual Studio Team Server, Git • Experience with private and public cloud architecture, pros/cons and migration considerations.Role & responsibilities Preferred candidate profile

Posted 6 days ago

Apply

Exploring PySpark Jobs in India

PySpark, a powerful data processing framework built on top of Apache Spark and Python, is in high demand in the job market in India. With the increasing need for big data processing and analysis, companies are actively seeking professionals with PySpark skills to join their teams. If you are a job seeker looking to excel in the field of big data and analytics, exploring PySpark jobs in India could be a great career move.

Top Hiring Locations in India

Here are 5 major cities in India where companies are actively hiring for PySpark roles: 1. Bangalore 2. Pune 3. Hyderabad 4. Mumbai 5. Delhi

Average Salary Range

The estimated salary range for PySpark professionals in India varies based on experience levels. Entry-level positions can expect to earn around INR 6-8 lakhs per annum, while experienced professionals can earn upwards of INR 15 lakhs per annum.

Career Path

In the field of PySpark, a typical career progression may look like this: 1. Junior Developer 2. Data Engineer 3. Senior Developer 4. Tech Lead 5. Data Architect

Related Skills

In addition to PySpark, professionals in this field are often expected to have or develop skills in: - Python programming - Apache Spark - Big data technologies (Hadoop, Hive, etc.) - SQL - Data visualization tools (Tableau, Power BI)

Interview Questions

Here are 25 interview questions you may encounter when applying for PySpark roles:

  • Explain what PySpark is and its main features (basic)
  • What are the advantages of using PySpark over other big data processing frameworks? (medium)
  • How do you handle missing or null values in PySpark? (medium)
  • What is RDD in PySpark? (basic)
  • What is a DataFrame in PySpark and how is it different from an RDD? (medium)
  • How can you optimize performance in PySpark jobs? (advanced)
  • Explain the difference between map and flatMap transformations in PySpark (basic)
  • What is the role of a SparkContext in PySpark? (basic)
  • How do you handle schema inference in PySpark? (medium)
  • What is a SparkSession in PySpark? (basic)
  • How do you join DataFrames in PySpark? (medium)
  • Explain the concept of partitioning in PySpark (medium)
  • What is a UDF in PySpark? (medium)
  • How do you cache DataFrames in PySpark for optimization? (medium)
  • Explain the concept of lazy evaluation in PySpark (medium)
  • How do you handle skewed data in PySpark? (advanced)
  • What is checkpointing in PySpark and how does it help in fault tolerance? (advanced)
  • How do you tune the performance of a PySpark application? (advanced)
  • Explain the use of Accumulators in PySpark (advanced)
  • How do you handle broadcast variables in PySpark? (advanced)
  • What are the different data sources supported by PySpark? (medium)
  • How can you run PySpark on a cluster? (medium)
  • What is the purpose of the PySpark MLlib library? (medium)
  • How do you handle serialization and deserialization in PySpark? (advanced)
  • What are the best practices for deploying PySpark applications in production? (advanced)

Closing Remark

As you explore PySpark jobs in India, remember to prepare thoroughly for interviews and showcase your expertise confidently. With the right skills and knowledge, you can excel in this field and advance your career in the world of big data and analytics. Good luck!

cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies