Spark Data Engineer / Developer (DevOps & OpenShift)

9 years

0 Lacs

Posted:3 weeks ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Introduction

Joining the IBM Technology Expert Labs teams means you'll have a career delivering world-class services for our clients. As the ultimate expert in IBM products, you'll bring together all the necessary technology and services to help customers solve their most challenging problems. Working in IBM Technology Expert Labs means accelerating the time to value confidently and ensuring speed and insight while our clients focus on what they do best—running and growing their business.Excellent onboarding and industry-leading learning culture will set you up for a positive impact, while advancing your career. Our culture is collaborative and experiential. As part of a team, you will be surrounded by bright minds and keen co-creators—always willing to help and be helped—as you apply passion to work that will positively impact the world around us.

Your Role And Responsibilities

About the RoleWe are seeking a highly skilled and experienced

Spark Data Engineer / Developer

to join our dynamic team. This role is critical for building, optimizing, and supporting our cutting-edge data platform, leveraging

Apache Spark

,

Apache Iceberg

, and a robust

DevOps

approach within an

OpenShift

environment. The ideal candidate will be adept at both developing high-performance data solutions and ensuring their stability and reliability in a production setting.Key Responsibilities
  • Design, develop, and optimize scalable and resilient data processing applications using Apache Spark (batch, streaming, and real-time).
  • Implement and manage data pipelines, ensuring data quality, consistency, and performance.
  • Perform Spark job performance tuning and optimization to handle large-scale datasets efficiently.
  • Manage and automate the deployment of Spark applications within OpenShift clusters, utilizing Docker and Kubernetes.
  • Establish and maintain CI/CD pipelines for automated testing, deployment, and release management of Spark workloads.
  • Provide comprehensive production support for critical Spark jobs, including proactive monitoring, troubleshooting, debugging, and participation in on-call rotations.
  • Work extensively with Apache Iceberg table format, leveraging its capabilities for schema evolution, time travel, hidden partitioning, and ACID transactions.
  • Collaborate closely with data scientists, other data engineers, and operations teams to deliver robust and integrated solutions.
  • Develop and maintain documentation for data pipelines, job configurations, and operational procedures.

Preferred Education

Master's Degree

Required Technical And Professional Expertise

Required Skills & Experience
  • 9+ years of hands-on experience in data engineering or software development roles.
  • Expert-level proficiency in Apache Spark (Spark Core, Spark SQL, Spark Streaming).
  • Strong programming skills in Scala, Python (PySpark), or Java.
  • Significant experience with OpenShift, including deploying, managing, and automating containerized applications within the platform.
  • Solid understanding of Docker and Kubernetes for containerization and orchestration.
  • Proven experience implementing and maintaining CI/CD pipelines using tools like Jenkins, GitLab CI, or similar.
  • Demonstrable experience with Apache Iceberg, including practical application of its features like schema evolution, time travel queries, and ACID compliance.
  • Strong background in production support for data applications, including monitoring, troubleshooting, and incident resolution.
  • Understanding and practical application of DevOps principles (Infrastructure as Code, automation, continuous monitoring).
  • Strong SQL skills and experience working with various data sources.
  • Excellent analytical and problem-solving abilities for diagnosing and resolving complex issues in distributed environments.
  • Familiarity with distributed systems concepts and architectures.

Desired Skills

Preferred technical and professional experience

  • Experience with other data lake table formats (e.g., Delta Lake, Apache Hudi).
  • Familiarity with cloud platforms (AWS, Azure, GCP) beyond OpenShift.
  • Experience with messaging queues or streaming platforms like Apache Kafka.
  • Contributions to open-source data projects.

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
IBM logo
IBM

Information Technology

Armonk

RecommendedJobs for You