Data Engineer - Scala Spark

3 years

0 Lacs

Posted:4 days ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Role Summary:


Design, build, and optimize large-scale ETL and data-processing pipelines handling GB–TB volumes. Operate within the Databricks ecosystem and drive migration of selected workloads to high-performance engines such as Polars and DuckDB. Maintain strong engineering rigor across CI/CD, testing, and code-quality enforcement. Apply analytical thinking to solve data reliability, performance, and scalability problems. AI familiarity is advantageous.


Core Responsibilities:

  • Develop and maintain distributed data pipelines using Scala, Spark, Delta, and Databricks.
  • Engineer robust ETL workflows tuned for high-volume ingestion, transformation, and publishing.
  • Profile pipelines, remove bottlenecks, and optimize compute, storage, and job orchestration.
  • Lead migration of suitable workloads to Polars, DuckDB, or equivalent high-performance engines.
  • Implement CI/CD workflows with automated builds, tests, deployments, and environment gating.
  • Enforce coding standards through code coverage targets, unit/integration tests, and SonarQube rules.
  • Ensure pipeline observability: logging, data quality checks, lineage, and failure diagnostics.
  • Apply analytical reasoning to triage complex data issues and deliver root-cause clarity.
  • Contribute to AI-aligned initiatives when required: RAG design, fine-tuning workflows, agentic patterns.
  • Collaborate with product, analytics, and platform teams to operationalize data solutions


Required Skills and Experience:

  • 3+ years in data engineering with strong command of Scala and Spark.
  • Proven background in ETL design, distributed processing, and high-volume data systems.
  • Hands-on experience with Databricks (jobs, clusters, notebooks, Delta Lake).
  • Proficiency in workflow optimization, performance tuning, and memory management.
  • Experience with Polars, DuckDB, or similar columnar/accelerated engines.
  • CI/CD discipline using Git-based pipelines; strong testing and code-quality practices.
  • Familiarity with SonarQube, coverage metrics, and static analysis.
  • Strong analytical and debugging capability across data, pipelines, and infra.
  • Exposure to AI concepts: embeddings, vector stores, retrieval-augmented generation, fine-tuning, agentic architectures.

Preferred

  • Experience with Azure cloud environments .
  • Experience in metadata-driven or config-driven pipeline frameworks.

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You