Job
Description
This role is for one of our clients Industry: Technology, Information and Media Seniority level: Mid-Senior level Min Experience: 5 years Location: Bengaluru, India, Karnataka JobType: full-time We are seeking a Big Data Engineer with deep technical expertise to join our fast-paced, data-driven team. In this role, you will be responsible for designing and building robust, scalable, and high-performance data pipelines that fuel real-time analytics, business intelligence, and machine learning applications across the organization. If you thrive on working with large datasets, cutting-edge technologies, and solving complex data engineering challenges, this is the opportunity for you. What You’ll Do Design & Build Pipelines : Develop efficient, reliable, and scalable data pipelines that process large volumes of structured and unstructured data using big data tools. Distributed Data Processing : Leverage the Hadoop ecosystem (HDFS, Hive, MapReduce) to manage and transform massive datasets. Starburst (Trino) Integration : Design and optimize federated queries using Starburst, enabling seamless access across diverse data platforms. Databricks Lakehouse Development : Utilize Spark, Delta Lake, and MLflow on the Databricks Lakehouse Platform to enable unified analytics and AI workloads. Data Modeling & Architecture : Work with stakeholders to translate business requirements into flexible, scalable data models and architecture. Performance & Optimization : Monitor, troubleshoot, and fine-tune pipeline performance to ensure efficiency, reliability, and data integrity. Security & Compliance : Implement and enforce best practices for data privacy, security, and compliance with global regulations like GDPR and CCPA. Collaboration : Partner with data scientists, product teams, and business users to deliver impactful data solutions and improve decision-making. What You Bring Must-Have Skills 5+ years of hands-on experience in big data engineering, data platform development, or similar roles. Strong experience with Hadoop , including HDFS, Hive, HBase, and MapReduce. Deep understanding and practical use of Starburst (Trino) or Presto for large-scale querying. Hands-on experience with Databricks Lakehouse Platform , Spark, and Delta Lake. Proficient in SQL and programming languages like Python or Scala . Strong knowledge of data warehousing, ETL/ELT workflows, and schema design. Familiarity with CI/CD tools, version control (Git), and workflow orchestration tools (Airflow or similar). Nice-to-Have Skills Experience with cloud environments such as AWS , Azure , or GCP . Exposure to Docker , Kubernetes , or infrastructure-as-code tools. Understanding of data governance and metadata management platforms. Experience supporting AI/ML initiatives with curated datasets and pipelines.