Job Title:
Senior Big Data Engineer
Department:
Data & Analytics
Employment Type:
Full time
Role Overview
We are seeking an experienced Big Data Engineer to join our growing Data & Analytics team. In this role, you ll design, build, and optimize our modern data platform enabling scalable pipelines, reliable data lakes, and robust ML operations. You ll work closely with data scientists, analysts, and stakeholders to turn raw data into actionable insights.
Key Responsibilities
-
Data Platform Development & Maintenance
- Design, deploy, and maintain ETL/ELT pipelines using Apache Airflow and dbt/meltano.
- Architect and manage data lakes on HDFS and object storage (e.g., S3/HDFS).
- Implement table formats and governance with Apache Iceberg and Unity Catalog/Apache ranger/Atlas.
-
Big Data Processing
- Develop distributed data processing jobs in Apache Spark (Scala/Java/Python).
- Optimize Spark applications for performance, cost, and scalability.
- Good working knowledge in Databases (Postgres, Mongo, SQL, Sql server etc)
-
Collaboration with Data science teams
- Integrate MLflow for experiment tracking, model registry, and seamless deployment.
- Collaborate with data science teams to operationalize models in production.
-
Containerization & Orchestration
- Containerize data services and workflows using Docker/kubenetes.
- Orchestrate deployments on Kubernetes clusters (EKS/GKE/AKS).
-
Center of Excellence & Best Practices
- Contribute to and govern the dbt Center of Excellence: define standards, code reviews, and documentation.
- Advocate and enforce data engineering best practices: CI/CD, testing, monitoring, and observability.
-
Cross Functional Collaboration
- Partner with Analytics, Product, and Engineering teams to understand requirements and deliver solutions.
- Mentor junior engineers and share knowledge through code reviews and technical talks.
Required Skills & Experience
- 5+ years of professional experience in Big Data engineering or similar roles.
-
Airflow:
Hands on experience authoring DAGs, configuring operators/sensors, and managing scheduler/executor backends. -
Apache Spark:
Proficient in building and tuning large scale ETL jobs (Scala/Python). -
Apache Iceberg & Unity Catalog:
Familiarity with table formats, partitioning strategies, and catalog management. -
Data Lake & HDFS:
Solid understanding of storage architectures, file formats (Parquet/ORC), and data partitioning. -
MLflow:
Working knowledge of experiment tracking, model registry, and deployment workflows. -
Kubernetes:
Skilled in deploying, scaling, and debugging containerized applications and stateful workloads. -
dbt COE:
Experience with dbt for data modeling, CI/CD integration, and establishing a CoE for standards. - Strong programming skills in Python and/or Scala.
- Experience with version control (Git), CI/CD pipelines (Jenkins/GitHub Actions), and monitoring tools (Prometheus, Grafana).
Nice to Have
- Familiarity with cloud platforms (AWS, GCP, or Azure) and managed services (EMR, Dataproc, EKS, GKE).
- Knowledge of additional data catalogs (e.g., AWS Glue, Hive Metastore).
- Background in event driven architectures (Kafka, Pulsar).
- Prior involvement in data governance or compliance initiatives.
Education & Certifications
- Bachelor s or Master s degree in Computer Science, Engineering, or related field.
- Relevant certifications (e.g., Any cloud certification AWS/GCP/Azure/Databricks Certified Big Data Specialty) are a plus.
Soft Skills
- Excellent problem solving and analytical abilities.
- Strong written and verbal communication; able to explain technical concepts to non technical stakeholders.
- Proactive mindset with a passion for learning and driving innovation.
- Collaborative team player who thrives in agile, fast paced environments.