Sprouts.ai - Senior Data Architect - ETL/PySpark

10 years

0 Lacs

Posted:1 day ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Job Title :

Senior Data Architect

Location :

Bangalore/Chandigarh

Job Type :

Full-time

Experience :

10+ yearsJob Summary are looking for an experienced Data Architect to lead the design, development, and optimization of our modern data infrastructure. The ideal candidate will have deep expertise in big data platforms, data lakes, lakehouse architectures, and hands-on experience with modern tools such as Spark clusters, PySpark, Apache Iceberg, the Nessie catalog, and Apache Airflow.This role will be pivotal in evolving our data platform, including database migrations, scalable data pipelines, and governance-ready architectures for both analytical and operational use cases.

Key Responsibilities

  • Design and implement scalable and reliable data architectures for real-time and batch processing systems
  • Evaluate and recommend data tools, frameworks, and infrastructure aligned with company goals
  • Develop and optimize complex ETL/ELT pipelines using PySpark and Apache Airflow
  • Architect and manage data lakes using Spark on Apache Iceberg and Nessie catalog for
versioned and governed data workflows
  • Perform data analysis, data profiling, data quality improvements, and data modeling
  • Lead database migration efforts, including planning, execution, and optimization
  • Define and enforce data engineering best practices, data governance standards, and schema
evolution strategies
  • Collaborate cross-functionally with data scientists, analysts, platform engineers, and business Skills & Qualifications :
  • 10+ years of experience in data architecture, data engineering, data security, data
governance, and big data platforms
  • Deep understanding of trade-offs between managed services and open-source data stack
tools, including cost, scalability, operational overhead, flexibility, and vendor lock-in
  • Strong hands-on experience with PySpark for writing data pipelines and distributed data processing
  • Proven expertise with Apache Iceberg, Apache Hudi, and the Nessie catalog for modern table formats and versioned data catalogs
  • Experience in scaling and managing Elasticsearch and PostgreSQL clusters
  • Strong experience with Apache Airflow for workflow orchestration (or equivalent tools)
  • Demonstrated success in database migration projects across multiple cloud providers
  • Ability to perform deep data analysis and compare datasets between systems
  • Experience handling 100s of terabytes of data or more
  • Proficiency in SQL, data modeling, and performance tuning
  • Excellent communication and presentation skills, with the ability to lead technical
conversations

Nice To Have

  • Experience in Sales, Marketing, and CRM domains, especially with Accounts and Contacts data
  • Knowledge in AI and vector databases.
  • Exposure to streaming data frameworks (Kafka, Flink, etc.)
  • Ability to support analytics and reporting initiatives

Why Join Us

  • Work on cutting-edge data architectures using modern open-source technologies
  • Be part of a team transforming data operations and analytics at scale
  • Opportunity to architect high-impact systems from the ground up
  • Join a collaborative, innovation-driven culture
(ref:hirist.tech)

Mock Interview

Practice Video Interview with JobPe AI

Start PySpark Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You