Jobs
Interviews

344 Hdfs Jobs - Page 7

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

7.0 - 12.0 years

7 - 11 Lacs

Bengaluru, Karnataka, India

On-site

Essential Responsibilities: As a Lead or Principal Data Engineer, your responsibilities will include: Building, refining, tuning, and maintaining our real-time and batch data infrastructure Daily use technologies such as HDFS, Spark, Snowflake, Hive, HBase, Scylla, Django, FastAPI, etc. Maintaining data quality and accuracy across production data systems Working with Data Engineers to optimize data models and workflows Working with Data Analysts to develop ETL processes for analysis and reporting Working with Product Managers to design and build data products Working with our DevOps team to scale and optimize our data infrastructure Participate in architecture discussions, influence the road map, take ownership and responsibility over new projects Participating in 24/7 on-call rotation (be available by phone or email in case something goes wrong) Desired Characteristics: Minimum 7 years of software engineering experience. Proven long term experience and enthusiasm for distributed data processing at scale, eagerness to learn new things. Expertise in designing and architecting distributed low latency and scalable solutions in either cloud and onpremises environment. Exposure to the whole software development lifecycle from inception to production and monitoring Fluency in Python or solid experience in Scala, Java Proficient with relational databases and Advanced SQL Expert in usage of services like Spark, HDFS, Hive, HBase Experience in adequate usage of any scheduler such as Apache Airflow, Apache Luigi, Chronos etc. Experience in adequate usage of cloud services (AWS) at scale Experience in agile software development processes Excellent interpersonal and communication skills Nice to have: Experience with large scale / multi-tenant distributed systems Experience with columnar / NoSQL databases Vertica, Snowflake, HBase, Scylla, Couchbase Experience in real team streaming frameworks Flink, Storm Experience with web frameworks such as Flask, Django

Posted 1 month ago

Apply

4.0 - 7.0 years

7 - 17 Lacs

Hyderabad

Work from Office

Bigdata And Hadoop Developer Location : Hyderabad Experience : 4-7 Years Hive/Oracle/MySQL, Data Architecture, Modelling (Conceptual/Logical/Design/ER model) Responsibility of / Expectations from the Role** Development, support, and maintenance of the infrastructure platform and application lifecycle. Design, development and implementation of automation innovations. Development of automated testing scripts. Contribution to all phases of the application lifecycle requirements, development, testing, implementation, and support. Responding and providing guidance to customers of the Big Data platform Defining and implementing integration points with existing technology systems Researching and remaining current on big data technology and industry trends and innovations.

Posted 1 month ago

Apply

6.0 - 10.0 years

20 - 30 Lacs

Egypt, Chennai, Bengaluru

Hybrid

We're Hiring: MLOps Engineer | Cairo, Egypt | Immediate Joiners Only Share CVs to vijay.s@xebia.com Location: Cairo, Egypt Experience: 6-8 Years Mode: Onsite Joining: Immediate or Max 2 Weeks Notice Relocation: Open to relocating to Egypt ASAP Job Summary: Xebia is seeking a seasoned MLOps Engineer to scale and operationalize ML solutions for our strategic client in Cairo. This is an onsite role , perfect for professionals who are ready to deploy cutting-edge ML pipelines in real-world enterprise environments. Key Responsibilities: Design & manage end-to-end scalable, reliable ML pipelines Build CI/CD pipelines with Azure DevOps Deploy and track ML models using MLflow Work on large-scale data with Cloudera/Hadoop (Hive, Spark, HDFS) Support Knowledge Graphs , metadata enrichment, model lineage Collaborate with DS & engineering teams to ensure governance and auditability Implement model performance monitoring, drift detection, and data quality checks Support DevOps automation aligned with enterprise-grade compliance standards Required Skills: 6-8 years in MLOps / Machine Learning Engineering Hands-on with MLflow , Azure DevOps , Python Deep experience with Cloudera , Hadoop , Spark , Hive Exposure to Knowledge Graphs , containerization (Docker/Kubernetes) Familiar with TensorFlow , scikit-learn , or PyTorch Understanding of data security, access controls, audit logging Preferred: Azure Certifications (e.g., Azure Data Engineer / AI Engineer Associate ) Experience with Apache NiFi , Airflow , or similar tools Background in regulated sectors like BFSI, Healthcare, or Pharma Soft Skills: Strong problem-solving & analytical thinking Clear communication & stakeholder engagement Passion for automation & continuous improvement Additional Information: Only apply if: You can join within 2 weeks or are an immediate joiner You're open to relocating to Cairo, Egypt ASAP You hold a valid passport Visa-on-arrival/B1/Schengen holders from MEA region preferred To Apply: Send your updated CV to vijay.s@xebia.com along with: Full Name Total Experience Current CTC Expected CTC Current Location Preferred Xebia Location (Cairo) Notice Period / Last Working Day (if serving) Primary Skills LinkedIn Profile Valid Passport No Be part of a global transformation journey make AI work at scale! #MLOps #Hiring #AzureDevOps #MLflow #CairoJobs #ImmediateJoiners #DataEngineering #Cloudera #Hadoop #XebiaCareers

Posted 1 month ago

Apply

6.0 - 10.0 years

20 - 30 Lacs

Bengaluru

Hybrid

Hiring: Data Engineer SDE2 | Bangalore | Immediate Joiners Join our growing team at Xebia! Were looking for a Data Engineer (SDE2 level) with strong hands-on expertise in Python, PySpark, Hive, Hadoop HDFS, Oozie, and Yarn . Role Details: Position : Data Engineer SDE2 Experience : 6 – 10 Years Location : Bangalore Mode of Work : Hybrid – Work from client location in Whitefield 3 days a week Notice Period : Only Immediate Joiners or up to 2 weeks’ notice will be considered Technical Skills Required: Proficient in Python and PySpark Experience with Hive , Hadoop HDFS , Oozie , and Yarn Strong problem-solving and data engineering fundamentals How to Apply: If you meet the criteria and are ready to take the next step in your career, please email your resume to vijay.s@xebia.com with the following details: Full Name Total Experience Current CTC Expected CTC Current Location Preferred Xebia Location Notice Period / Last Working Day (if serving) Primary Skills LinkedIn Profile Why Xebia? Work on cutting-edge data engineering projects with a global tech leader. Join a collaborative team culture that values innovation and agility. Apply now – we’re moving fast!

Posted 1 month ago

Apply

2.0 - 6.0 years

0 Lacs

maharashtra

On-site

Job Description: We are looking for a skilled PySpark Developer having 4-5 or 2-3 years of experience to join our team. As a PySpark Developer, you will be responsible for developing and maintaining data processing pipelines using PySpark, Apache Spark's Python API. You will work closely with data engineers, data scientists, and other stakeholders to design and implement scalable and efficient data processing solutions. Bachelor's or Master's degree in Computer Science, Data Science, or a related field is required. The ideal candidate should have strong expertise in the Big Data ecosystem including Spark, Hive, Sqoop, HDFS, Map Reduce, Oozie, Yarn, HBase, Nifi. The candidate should be below 35 years of age and have experience in designing, developing, and maintaining PySpark data processing pipelines to process large volumes of structured and unstructured data. Additionally, the candidate should collaborate with data engineers and data scientists to understand data requirements and design efficient data models and transformations. Optimizing and tuning PySpark jobs for performance, scalability, and reliability is a key responsibility. Implementing data quality checks, error handling, and monitoring mechanisms to ensure data accuracy and pipeline robustness is crucial. The candidate should also develop and maintain documentation for PySpark code, data pipelines, and data workflows. Experience in developing production-ready Spark applications using Spark RDD APIs, Data frames, Datasets, Spark SQL, and Spark Streaming is required. Strong experience of HIVE Bucketing and Partitioning, as well as writing complex hive queries using analytical functions, is essential. Knowledge in writing custom UDFs in Hive to support custom business requirements is a plus. If you meet the above qualifications and are interested in this position, please email your resume, mentioning the position applied for in the subject column at: careers@cdslindia.com.,

Posted 1 month ago

Apply

6.0 - 10.0 years

0 Lacs

karnataka

On-site

As an experienced Big Data Engineer with 6 to 8 years of experience, you will be responsible for designing, developing, and maintaining robust and scalable data processing applications using Apache Spark with Scala. Your key responsibilities include implementing and managing complex data workflows, conducting performance optimization and memory tuning for Spark jobs, and working proficiently with Apache Hive for data warehousing and HDFS for distributed storage within the Hadoop ecosystem. You will utilize your strong hands-on experience with Apache Spark using Scala, Hive, and HDFS, along with proficiency in Oozie workflows, ScalaTest, and Spark performance tuning. Your deep understanding of Spark UI, YARN logs, and debugging distributed jobs will be essential in diagnosing and resolving complex issues in distributed data systems to ensure data accuracy and pipeline efficiency. In addition, you will demonstrate expertise in writing comprehensive unit tests using ScalaTest and following best practices for building scalable and reliable data pipelines. Your familiarity with Agile/Scrum methodologies will enable you to work within an Agile/Scrum environment, collaborating with cross-functional teams to deliver high-quality data solutions. Moreover, your working knowledge of CI/CD pipelines, GitHub, Maven, and Nexus will be utilized for continuous integration, delivery, and version control. Your ability to write unit tests and follow best practices for scalable data pipelines will contribute to the successful execution of data projects in an enterprise-level distributed data system environment. Overall, you will play a crucial role in the development, optimization, and maintenance of data processing applications, ensuring high performance, reliability, and efficiency in data workflows within the Big Data platform. (Note: This job description is based on the provided information and is intended to give a standard summary description in the second person format. Headers have been omitted for the final JD.),

Posted 1 month ago

Apply

3.0 - 7.0 years

0 Lacs

hyderabad, telangana

On-site

As a Data Engineer, you will be responsible for designing, developing, and maintaining scalable data pipelines using Spark, specifically PySpark or Spark with Scala. Your role will involve building data ingestion and transformation frameworks for various structured and unstructured data sources. Collaboration with data analysts, data scientists, and business stakeholders is essential to understand requirements and deliver reliable data solutions. Working with large volumes of data, you will ensure quality, integrity, and consistency while optimizing data workflows for performance, scalability, and cost efficiency on cloud platforms such as AWS, Azure, or GCP. Implementation of data quality checks and automation for ETL/ELT pipelines is a critical aspect of this role. Monitoring and troubleshooting data issues in production, along with performing root cause analysis, will be part of your responsibilities. Additionally, documenting technical processes, system designs, and operational procedures will be necessary. The ideal candidate for this position should have at least 3+ years of experience as a Data Engineer or in a similar role. Hands-on experience with PySpark or Spark using Scala is required, along with a strong knowledge of SQL for data querying and transformation. Experience working with any cloud platform (AWS, Azure, or GCP) and a solid understanding of data warehousing concepts and big data architecture are essential. Familiarity with version control systems like Git is also a must-have skill. In addition to the must-have skills, it would be beneficial to have experience with data orchestration tools like Apache Airflow, Databricks Workflows, or similar. Knowledge of Delta Lake, HDFS, or Kafka, familiarity with containerization tools (Docker/Kubernetes), exposure to CI/CD practices and DevOps principles, and an understanding of data governance, security, and compliance standards are considered good-to-have skills. If you meet the above requirements and are ready to join immediately, please share your details via email to nitin.patil@ust.com for quick processing. Act fast for immediate attention!,

Posted 1 month ago

Apply

7.0 - 11.0 years

0 Lacs

karnataka

On-site

As a skilled Senior Engineer at Impetus Technologies, you will utilize your expertise in Java and Big Data technologies to design, develop, and deploy scalable data processing applications. Your responsibilities will include collaborating with cross-functional teams, developing high-quality code, and optimizing data processing workflows. Additionally, you will mentor junior engineers and contribute to architectural decisions to enhance system performance and scalability. Key Responsibilities: - Design, develop, and maintain high-performance applications using Java and Big Data technologies. - Implement data ingestion and processing workflows with frameworks like Hadoop and Spark. - Collaborate with the data architecture team to define efficient data models. - Optimize existing applications for performance, scalability, and reliability. - Mentor junior engineers, provide technical leadership, and promote continuous improvement. - Participate in code reviews and ensure best practices for coding, testing, and documentation. - Stay up-to-date with technology trends in Java and Big Data, and evaluate new tools and methodologies. Skills and Tools Required: - Strong proficiency in Java programming for building complex applications. - Hands-on experience with Big Data technologies like Apache Hadoop, Apache Spark, and Apache Kafka. - Understanding of distributed computing concepts and technologies. - Experience with data processing frameworks and libraries such as MapReduce and Spark SQL. - Familiarity with database systems like HDFS, NoSQL databases (e.g., Cassandra, MongoDB), and SQL databases. - Strong problem-solving skills and the ability to troubleshoot complex issues. - Knowledge of version control systems like Git and familiarity with CI/CD pipelines. - Excellent communication and teamwork skills for effective collaboration. About the Role: You will be responsible for designing and developing scalable Java applications for Big Data processing, collaborating with cross-functional teams to implement innovative solutions, and ensuring code quality and performance through best practices and testing methodologies. About the Team: You will work with a diverse team of skilled engineers, data scientists, and product managers in a collaborative environment that encourages knowledge sharing and continuous learning. Technical workshops and brainstorming sessions will provide opportunities to enhance your skills and stay updated with industry trends. Responsibilities: - Developing and maintaining high-performance Java applications for efficient data processing. - Implementing data integration and processing frameworks using Big Data technologies. - Troubleshooting and optimizing systems to enhance performance and scalability. To succeed in this role, you should have: - Strong proficiency in Java and experience with Big Data technologies and frameworks. - Solid understanding of data structures, algorithms, and software design principles. - Excellent problem-solving skills and the ability to work independently and within a team. - Familiarity with cloud platforms and distributed computing concepts is a plus. Qualification: Bachelor's or Master's degree in Computer Science, Engineering, or related field. Experience: 7 to 10 years Job Reference Number: 13131,

Posted 1 month ago

Apply

3.0 - 7.0 years

0 Lacs

karnataka

On-site

The ideal candidate for this position should have at least 5 years of experience in designing, developing, and maintaining scalable data pipelines using Spark, specifically with PySpark or Spark with Scala. You will be responsible for building data ingestion and transformation frameworks for structured and unstructured data sources, collaborating with data analysts, data scientists, and business stakeholders to understand requirements, and delivering reliable data solutions. Working with large volumes of data, you will ensure quality, integrity, and consistency while optimizing data workflows for performance, scalability, and cost efficiency on cloud platforms such as AWS, Azure, or GCP. Implementing data quality checks and automation for ETL/ELT pipelines will be part of your responsibilities, along with monitoring and troubleshooting data issues in production and performing root cause analysis. Documenting technical processes, system designs, and operational procedures is also a key aspect of this role. Required skills for this position include at least 3 years of experience as a Data Engineer or in a similar role, hands-on experience with PySpark or Spark using Scala, strong knowledge of SQL for data querying and transformation, experience working with any cloud platform (AWS, Azure, or GCP), solid understanding of data warehousing concepts and big data architecture, and experience with version control systems like Git. Desirable skills that would be beneficial for this role include knowledge of Delta Lake, HDFS, or Kafka, familiarity with containerization tools like Docker or Kubernetes, exposure to CI/CD practices and DevOps principles, and an understanding of data governance, security, and compliance standards. Candidates ready to join immediately can share their details via email to nitin.patil@ust.com. Act fast for immediate attention!,

Posted 1 month ago

Apply

3.0 - 7.0 years

0 Lacs

kolkata, west bengal

On-site

Candidates who are ready to join immediately can share their details via email for quick processing to nitin.patil@ust.com. Act fast for immediate attention! With over 5 years of experience, the ideal candidate will be responsible for designing, developing, and maintaining scalable data pipelines using Spark, either PySpark or Spark with Scala. They will also be tasked with building data ingestion and transformation frameworks for structured and unstructured data sources. Collaboration with data analysts, data scientists, and business stakeholders to understand requirements and deliver reliable data solutions is a key aspect of the role. The candidate will work with large volumes of data to ensure quality, integrity, and consistency, optimizing data workflows for performance, scalability, and cost efficiency on cloud platforms such as AWS, Azure, or GCP. Implementation of data quality checks and automation for ETL/ELT pipelines, as well as monitoring and troubleshooting data issues in production, are also part of the responsibilities. Documentation of technical processes, system designs, and operational procedures will be essential. Must-Have Skills: - At least 3 years of experience as a Data Engineer or in a similar role. - Hands-on experience with PySpark or Spark using Scala. - Strong knowledge of SQL for data querying and transformation. - Experience working with any cloud platform (AWS, Azure, or GCP). - Solid understanding of data warehousing concepts and big data architecture. - Experience with version control systems like Git. Good-to-Have Skills: - Experience with data orchestration tools like Apache Airflow, Databricks Workflows, or similar. - Knowledge of Delta Lake, HDFS, or Kafka. - Familiarity with containerization tools (Docker/Kubernetes). - Exposure to CI/CD practices and DevOps principles. - Understanding of data governance, security, and compliance standards.,

Posted 1 month ago

Apply

3.0 - 7.0 years

0 Lacs

thiruvananthapuram, kerala

On-site

The ideal candidate ready to join immediately can share their details via email for quick processing at nitin.patil@ust.com. Act fast for immediate attention! With over 5 years of experience, you will be responsible for designing, developing, and maintaining scalable data pipelines using Spark (PySpark or Spark with Scala). You will also build data ingestion and transformation frameworks for structured and unstructured data sources. Collaboration with data analysts, data scientists, and business stakeholders to understand requirements and deliver reliable data solutions will be a key aspect of the role. Working with large volumes of data, ensuring quality, integrity, and consistency, and optimizing data workflows for performance, scalability, and cost efficiency on cloud platforms (AWS, Azure, or GCP) are essential responsibilities. Additionally, implementing data quality checks and automation for ETL/ELT pipelines, monitoring and troubleshooting data issues in production, and performing root cause analysis will be part of your duties. You will also be expected to document technical processes, system designs, and operational procedures. Must-Have Skills: - Minimum 3 years of experience as a Data Engineer or in a similar role. - Hands-on experience with PySpark or Spark using Scala. - Strong knowledge of SQL for data querying and transformation. - Experience working with any cloud platform (AWS, Azure, or GCP). - Solid understanding of data warehousing concepts and big data architecture. - Familiarity with version control systems like Git. Good-to-Have Skills: - Experience with data orchestration tools such as Apache Airflow, Databricks Workflows, or similar. - Knowledge of Delta Lake, HDFS, or Kafka. - Familiarity with containerization tools like Docker/Kubernetes. - Exposure to CI/CD practices and DevOps principles. - Understanding of data governance, security, and compliance standards.,

Posted 1 month ago

Apply

3.0 - 7.0 years

0 Lacs

noida, uttar pradesh

On-site

The ideal candidate for this position should have at least 5 years of experience and must be ready to join immediately. In this role, you will be responsible for designing, developing, and maintaining scalable data pipelines using Spark, specifically PySpark or Spark with Scala. You will also be tasked with building data ingestion and transformation frameworks for structured and unstructured data sources. Collaboration with data analysts, data scientists, and business stakeholders to understand requirements and deliver reliable data solutions is a key aspect of this role. Working with large volumes of data to ensure quality, integrity, and consistency is crucial. Additionally, optimizing data workflows for performance, scalability, and cost efficiency on cloud platforms such as AWS, Azure, or GCP is a significant part of the responsibilities. Implementing data quality checks and automation for ETL/ELT pipelines, monitoring and troubleshooting data issues in production, and performing root cause analysis are also essential tasks. Documentation of technical processes, system designs, and operational procedures is expected. The must-have skills for this role include at least 3 years of experience as a Data Engineer or in a similar role, hands-on experience with PySpark or Spark using Scala, strong knowledge of SQL for data querying and transformation, experience working with any cloud platform (AWS, Azure, or GCP), a solid understanding of data warehousing concepts and big data architecture, and experience with version control systems like Git. Good-to-have skills for this position include experience with data orchestration tools like Apache Airflow, Databricks Workflows, or similar, knowledge of Delta Lake, HDFS, or Kafka, familiarity with containerization tools such as Docker/Kubernetes, exposure to CI/CD practices and DevOps principles, and an understanding of data governance, security, and compliance standards. If you meet the qualifications and are interested in this exciting opportunity, please share your details via email at nitin.patil@ust.com for quick processing. Act fast to grab this immediate attention!,

Posted 1 month ago

Apply

3.0 - 7.0 years

0 Lacs

pune, maharashtra

On-site

As a Data Engineer, you will be responsible for designing, developing, and maintaining scalable data pipelines using Spark (PySpark or Spark with Scala). Your role will involve building data ingestion and transformation frameworks for structured and unstructured data sources. Collaborating with data analysts, data scientists, and business stakeholders to understand requirements and deliver reliable data solutions will be a key aspect of your responsibilities. Additionally, you will work with large volumes of data to ensure quality, integrity, and consistency, optimizing data workflows for performance, scalability, and cost efficiency on cloud platforms such as AWS, Azure, or GCP. Implementing data quality checks and automation for ETL/ELT pipelines, monitoring and troubleshooting data issues in production, and documenting technical processes, system designs, and operational procedures are also part of your duties. To excel in this role, you should have at least 3 years of experience as a Data Engineer or in a similar role. Hands-on experience with PySpark or Spark using Scala is essential, along with a strong knowledge of SQL for data querying and transformation. You should also have experience working with any cloud platform (AWS, Azure, or GCP), a solid understanding of data warehousing concepts and big data architecture, and familiarity with version control systems like Git. While not mandatory, it would be beneficial to have experience with data orchestration tools like Apache Airflow, Databricks Workflows, or similar, knowledge of Delta Lake, HDFS, or Kafka, familiarity with containerization tools such as Docker or Kubernetes, exposure to CI/CD practices and DevOps principles, and an understanding of data governance, security, and compliance standards. If you are ready to join immediately and possess the required skills and experience, please share your details via email at nitin.patil@ust.com. Act fast for immediate attention!,

Posted 1 month ago

Apply

2.0 - 6.0 years

0 Lacs

hyderabad, telangana

On-site

As a Software Engineer II at JPMorgan Chase within the Employee Platforms team, you will be part of an agile team responsible for designing and delivering trusted market-leading technology products in a secure, stable, and scalable manner to support the firm's business objectives. Your role involves executing creative software solutions, developing high-quality production code, and identifying opportunities to automate remediation of recurring issues to enhance operational stability. You will lead evaluation sessions with external vendors, startups, and internal teams to drive outcomes-oriented probing of architectural designs, technical credentials, and applicability for use within existing systems. Additionally, you will lead communities of practice across Software Engineering to promote the adoption of new and leading-edge technologies. Your responsibilities will include collaborating with stakeholders, demonstrating strong expertise in solving business problems through innovation, and managing the firm's capital reserves effectively. You will collaborate across teams to drive features, eliminate blockers, and produce high-quality documentation of cloud solutions as reusable patterns. To qualify for this role, you should have formal training or certification in software engineering concepts along with at least 2 years of applied experience. You must possess hands-on practical experience in system design, application development, testing, and operational stability. Proficiency in automation, continuous delivery methods, agile methodologies, and advanced programming languages is essential. In addition, knowledge of financial services industry IT systems, Agile SDLC, and various technologies like Python, Big Data, Hadoop, Spark, Scala, Splunk, and application, data, and infrastructure architecture is required. Preferred qualifications include excellent team spirit, ability to work collaboratively, and knowledge of financial instruments. Proficiency in Core Java 8, Spring, JPA/Hibernate, and React JavaScript is desirable for this role.,

Posted 1 month ago

Apply

6.0 - 8.0 years

18 - 30 Lacs

Hyderabad

Work from Office

Key Skills: Hadoop, Cloudera, HDFS, YARN, Spark, Delta Lake, Linux, Docker, Kubernetes, Jenkins, REST API, Prometheus, Grafana, Splunk, PySpark, Python, Terraform, Ansible, GCP, DevOps, CI/CD, SRE, Agile, Infrastructure Automation Roles & Responsibilities: Lead and support technology teams in designing, developing, and managing data engineering and CI/CD pipelines, and infrastructure. Act as an Infrastructure/DevOps SME in designing and implementing solutions for risk analytics systems transformation, both tactical and strategic, aligned with regulatory and business initiatives. Collaborate with other technology teams, IT support teams, and architects to drive improvements in product delivery. Manage daily interactions with IT and central DevOps/infrastructure teams to ensure continuous support and delivery. Grow the technical expertise within the engineering community by mentoring and sharing knowledge. Design, maintain, and improve the full software delivery lifecycle. Enforce process discipline and improvements in areas like agile software delivery, production support, and DevOps pipeline development. Experience Requirement: 6-8 years of experience in platform engineering, SRE roles, and managing distributed/big data infrastructures. Strong hands-on experience with the Hadoop ecosystem, big data pipelines, and Delta Lake. Proven expertise in Cloudera Hadoop cluster management including HDFS, YARN, and Spark. In-depth knowledge of networking, Linux, HDFS, and DevSecOps tools like Docker, Kubernetes, and Jenkins. Skilled in containerization with Docker and orchestration using Kubernetes. Hands-on experience with designing and managing large-scale tech projects, including REST API standards. Experience with monitoring and logging tools such as Prometheus, Grafana, and Splunk. Global collaboration experience with IT and support teams across geographies. Strong coding skills in Spark (PySpark) and Python with at least 3 years of experience. Expertise in Infrastructure as Code (IaC) tools such as Terraform and Ansible. Working knowledge of GCP or other cloud platforms and their data engineering products is preferred. Familiarity with agile methodologies, with strong problem-solving and team collaboration skills. Education: B.Tech M.Tech (Dual), B.Tech, M. Tech.

Posted 2 months ago

Apply

3.0 - 6.0 years

3 - 15 Lacs

Hyderabad, Telangana, India

On-site

5 years of experience in Bigdata, Hive Experience in Python(Flask / FastAPI), Pyspark Experience in MongoDB, Postgres Extensive knowledge in SQL Writing. Good Understanding of Performance Tuning, Security aspects. Very good Analytical skills. Very good knowledge and hands-on experience in Bigdata, Hive. Very good knowledge and hands-on experience in Python(Flask / FastAPI), Pyspar Good understanding of Functional application development Unit Testing: Should be efficient in writing Unit Test cases [positive and negative scenarios] and executing them Experience in working in Agile environment

Posted 2 months ago

Apply

5.0 - 8.0 years

3 - 15 Lacs

Hyderabad, Telangana, India

On-site

Principal responsibilities Primary Skills - HDP, CDP, Linux, Python, Ansible and Kubernetes Proficient in Shell scripting, YAML Good Technical Design, Problem Solving and debugging skills. Understanding of how to use GitHub, Jenkins, Ansible etc. Understanding of CI/CD concept. Familiar with Test automation tools JUnit, Selenium, Cucumber Hands on development solutions using industry leading Cloud technologies. Working knowledge of Git Ops and DevSecOps Good To Have Skills: Linux, Kubernetes, Ansible, Spring Web Services, Micro Services Agile proficient and knowledgeable/experience in other agile methodologies. Ideally certified. Strong communication networking skills Ability to work autonomously and take accountability to execute and deliver on goals. Strong commitment to high quality standards. Good communication skills and sense of ownership to work as individual contributor.

Posted 2 months ago

Apply

7.0 - 10.0 years

9 - 12 Lacs

Hyderabad

Hybrid

Responsibilities of the Candidate : - Be responsible for the design and development of big data solutions. Partner with domain experts, product managers, analysts, and data scientists to develop Big Data pipelines in Hadoop - Be responsible for moving all legacy workloads to a cloud platform - Work with data scientists to build Client pipelines using heterogeneous sources and provide engineering services for data PySpark science applications - Ensure automation through CI/CD across platforms both in cloud and on-premises - Define needs around maintainability, testability, performance, security, quality, and usability for the data platform - Drive implementation, consistent patterns, reusable components, and coding standards for data engineering processes - Convert SAS-based pipelines into languages like PySpark, and Scala to execute on Hadoop and non-Hadoop ecosystems - Tune Big data applications on Hadoop and non-Hadoop platforms for optimal performance - Apply an in-depth understanding of how data analytics collectively integrate within the sub-function as well as coordinate and contribute to the objectives of the entire function. - Produce a detailed analysis of issues where the best course of action is not evident from the information available, but actions must be recommended/taken. - Assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients, and assets, by driving compliance with applicable laws, rules, and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct, and business practices, and escalating, managing and reporting control issues with transparency Requirements : - 6+ years of total IT experience - 3+ years of experience with Hadoop (Cloudera)/big data technologies - Knowledge of the Hadoop ecosystem and Big Data technologies Hands-on experience with the Hadoop eco-system (HDFS, MapReduce, Hive, Pig, Impala, Spark, Kafka, Kudu, Solr) - Experience in designing and developing Data Pipelines for Data Ingestion or Transformation using Java Scala or Python. - Experience with Spark programming (Pyspark, Scala, or Java) - Hands-on experience with Python/Pyspark/Scala and basic libraries for machine learning is required. - Proficient in programming in Java or Python with prior Apache Beam/Spark experience a plus. - Hand on experience in CI/CD, Scheduling and Scripting - Ensure automation through CI/CD across platforms both in cloud and on-premises - System level understanding - Data structures, algorithms, distributed storage & compute - Can-do attitude on solving complex business problems, good interpersonal and teamwork skills.

Posted 2 months ago

Apply

2.0 - 5.0 years

3 - 6 Lacs

Hyderabad, Telangana, India

On-site

We are seeking a PySpark Developer with IT experience . The ideal candidate will possess strong PySpark knowledge and hands-on experience in SQL, HDFS, Hive, Spark, PySpark, and Python . You will be instrumental in developing and optimizing data pipelines, working with large datasets, and implementing data processing solutions, particularly within the Azure Databricks environment. Key Responsibilities PySpark Development : Design, develop, and maintain robust and scalable data processing solutions using PySpark and Python . Data Lake & Warehousing : Work with large datasets stored in HDFS and Hive , applying concepts of Partitions and Bucketing for optimized data storage and retrieval. SQL & Data Processing : Utilize SQL and PySpark for efficient data manipulation, transformation, and processing. Azure Databricks : Develop and deploy solutions on Databricks with Azure , including PySpark notebook development . ETL & Data Pipelines : Build and optimize data pipelines , demonstrating a good understanding of Hadoop and Spark architectures . SCD Implementation : Get involved in SCD (Slowly Changing Dimension) Type 1 and Type 2 implementation . Collaboration : Work closely with data scientists and other engineers, contributing to data preparation for analytical and machine learning models. Performance Optimization : Ensure the performance and efficiency of data processing jobs. Required Skills and Experience 8+ years of total IT experience , with 5+ years of relevant work experience as a data engineer/developer. Strong PySpark knowledge and hands-on development experience. Proficient in SQL, HDFS, Hive, Spark, PySpark, and Python . Good understanding of Hadoop and Spark architectures . Good understanding of Partitions and Bucketing concepts in Hive . Good understanding of data and data processing using SQL or PySpark . Good experience in writing code on Python and PySpark . Good experience on Databricks with Azure and PySpark notebook development . Involved in SCD Type 1 and Type 2 implementation . Mandatory Skills PySpark Developer Azure Stack

Posted 2 months ago

Apply

2.0 - 5.0 years

3 - 10 Lacs

Hyderabad, Telangana, India

On-site

No of years experience 5+ Detailed job description - Skill Set: Bigdata Testing - Hadoop, HDFS, Hive, Kafka, Spark, SQL UNIX Mandatory Skills Bigdata Testing - Hadoop, HDFS, Hive, Kafka, Spark, SQL UNIX Good to Have Skills Bigdata Testing - Hadoop, HDFS, Hive, Kafka, Spark, SQL UNI

Posted 2 months ago

Apply

3.0 - 7.0 years

0 Lacs

karnataka

On-site

As an Associate Product Support Engineer focused on Hadoop distributed systems, you will be responsible for providing technical support to enterprise clients. Your main tasks will involve troubleshooting and resolving issues within Hadoop environments, ensuring the stability and reliability of our customers" infrastructures. Working closely with experienced engineers, you will collaborate with customers to understand their problems and deliver effective solutions with empathy and professionalism. Your key responsibilities will include addressing customer issues related to Hadoop clusters, core components (HDFS, YARN, MapReduce, Hive, etc.), and performing basic administrative tasks such as installation, configuration, and upgrades. You will document troubleshooting steps and solutions for knowledge sharing purposes. To excel in this role, you should have a minimum of 3 years of hands-on experience as a Hadoop Administrator or in a similar support role. A strong understanding of Hadoop architecture and core components, along with proven experience in troubleshooting Hadoop-related issues, is essential. Proficiency in Linux operating systems, good communication skills, and excellent problem-solving abilities are also required. Experience with components like Spark, NiFi, and HBase, as well as exposure to data security and data engineering principles within Hadoop environments, will be advantageous. Furthermore, prior experience in a customer-facing technical support role and familiarity with tools like Salesforce & Jira are considered beneficial. Knowledge of automation and scripting languages like Python and Bash is a plus. This role offers an opportunity for candidates passionate about Hadoop administration and customer support to deepen their expertise in a focused, high-impact environment.,

Posted 2 months ago

Apply

6.0 - 11.0 years

19 - 27 Lacs

Haryana

Work from Office

About Company Founded in 2011, ReNew, is one of the largest renewable energy companies globally, with a leadership position in India. Listed on Nasdaq under the ticker RNW, ReNew develops, builds, owns, and operates utility-scale wind energy projects, utility-scale solar energy projects, utility-scale firm power projects, and distributed solar energy projects. In addition to being a major independent power producer in India, ReNew is evolving to become an end-to-end decarbonization partner providing solutions in a just and inclusive manner in the areas of clean energy, green hydrogen, value-added energy offerings through digitalisation, storage, and carbon markets that increasingly are integral to addressing climate change. With a total capacity of more than 13.4 GW (including projects in pipeline), ReNew’s solar and wind energy projects are spread across 150+ sites, with a presence spanning 18 states in India, contributing to 1.9 % of India’s power capacity. Consequently, this has helped to avoid 0.5% of India’s total carbon emissions and 1.1% India’s total power sector emissions. In the over 10 years of its operation, ReNew has generated almost 1.3 lakh jobs, directly and indirectly. ReNew has achieved market leadership in the Indian renewable energy industry against the backdrop of the Government of India’s policies to promote growth of this sector. ReNew’s current group of stockholders contains several marquee investors including CPP Investments, Abu Dhabi Investment Authority, Goldman Sachs, GEF SACEF and JERA. Its mission is to play a pivotal role in meeting India’s growing energy needs in an efficient, sustainable, and socially responsible manner. ReNew stands committed to providing clean, safe, affordable, and sustainable energy for all and has been at the forefront of leading climate action in India. Job Description Key responsibilities: 1. Understand, implement, and automate ETL pipelines with better industry standards 2. Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, design infrastructure for greater scalability, etc 3. Developing, integrating, testing, and maintaining existing and new applications 4. Design, and create data pipelines (data lake / data warehouses) for real world energy analytical solutions 5. Expert-level proficiency in Python (preferred) for automating everyday tasks 6. Strong understanding and experience in distributed computing frameworks, particularly Spark, Spark-SQL, Kafka, Spark Streaming, Hive, Azure Databricks etc 7. Limited experience in using other leading cloud platforms preferably Azure. 8. Hands on experience on Azure data factory, logic app, Analysis service, Azure blob storage etc. 9. Ability to work in a team in an agile setting, familiarity with JIRA and clear understanding of how Git works 10. Must have 5-7 years of experience

Posted 2 months ago

Apply

8.0 - 12.0 years

7 - 11 Lacs

Pune

Work from Office

Experience with ETL processes and data warehousing Proficient in SQL and Python/Java/Scala Team Lead Experience

Posted 2 months ago

Apply

2.0 - 6.0 years

2 - 5 Lacs

Bengaluru, Karnataka, India

On-site

Responsibilities: Develop and maintain data processing workflows using Apache Spark and Scala Implement batch and streaming data pipelines Optimize Spark jobs for better performance and scalability Collaborate with data engineers and analysts to deliver data solutions Debug and resolve issues in production big data environments Integrate with data storage systems like HDFS, Kafka, and NoSQL databases Write clean, maintainable code with best practices in mind Required Skills: Strong programming skills in Scala and hands-on experience with Apache Spark Knowledge of Spark Core, Spark SQL, Spark Streaming, and MLlib Experience with Hadoop ecosystem components (HDFS, Hive, Kafka) Familiarity with functional programming concepts Experience with data serialization formats (Parquet, Avro, ORC) Version control (Git) and CI/CD understanding Good problem-solving and communication skills

Posted 2 months ago

Apply

4.0 - 8.0 years

8 - 12 Lacs

Pune

Hybrid

So, what’s the role all about? We are looking for a highly driven and technically skilled Software Engineer to lead the integration of various Content Management Systems with AWS Knowledge Hub, enabling advanced Retrieval-Augmented Generation (RAG) search across heterogeneous customer data—without requiring data duplication. This role will also be responsible for expanding the scope of Knowledge Hub to support non-traditional knowledge items and enhance customer self-service capabilities. You will work at the intersection of AI, search infrastructure, and developer experience to make enterprise knowledge instantly accessible, actionable, and AI-ready. How will you make an impact? Integrate CMS with AWS Knowledge Hub to allow seamless RAG-based search across diverse data types—eliminating the need to copy data into Knowledge Hub instances. Extend Knowledge Hub capabilities to ingest and index non-knowledge assets, including structured data, documents, tickets, logs, and other enterprise sources. Build secure, scalable connectors to read directly from customer-maintained indices and data repositories. Enable self-service capabilities for customers to manage content sources using App Flow, Tray.ai, configure ingestion rules, and set up search parameters independently. Collaborate with the NLP/AI team to optimize relevance and performance for RAG search pipelines. Work closely with product and UX teams to design intuitive, powerful experiences around self-service data onboarding and search configuration. Implement data governance, access control, and observability features to ensure enterprise readiness. Have you got what it takes? Proven experience with search infrastructure, RAG pipelines, and LLM-based applications. 5+ Years’ hands-on experience with AWS Knowledge Hub, AppFlow, Tray.ai, or equivalent cloud-based indexing/search platforms. Strong backend development skills (Python, Typescript/NodeJS, .NET/Java) and familiarity with building and consuming REST APIs. Infrastructure as a code (IAAS) service like AWS Cloud formation, CDK knowledge Deep understanding of data ingestion pipelines, index management, and search query optimization. Experience working with unstructured and semi-structured data in real-world enterprise settings. Ability to design for scale, security, and multi-tenant environment. What’s in it for you? Join an ever-growing, market disrupting, global company where the teams – comprised of the best of the best – work in a fast-paced, collaborative, and creative environment! As the market leader, every day at NICE is a chance to learn and grow, and there are endless internal career opportunities across multiple roles, disciplines, domains, and locations. If you are passionate, innovative, and excited to constantly raise the bar, you may just be our next NICEr! Enjoy NICE-FLEX! At NICE, we work according to the NICE-FLEX hybrid model, which enables maximum flexibility: 2 days working from the office and 3 days of remote work, each week. Naturally, office days focus on face-to-face meetings, where teamwork and collaborative thinking generate innovation, new ideas, and a vibrant, interactive atmosphere. Reporting into: Tech Manager, Engineering, CX Role Type: Individual Contributor

Posted 2 months ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies