Lead Data Engineer

8.0 years

0.0 Lacs P.A.

Pune, Maharashtra, India

Posted:6 days ago| Platform: Linkedin logo

Apply Now

Skills Required

datapythonapachesparkairflowdesignprocessinganalyticslearningpysparkcodeetlorchestrationsensorshadoopawsdatabricksintegritysecuritygovernancepipelineresolvesupportarchitecturescalabilityengineeringdevelopmentnumpywritingworkflowschedulinggcpazureredshiftsqlnosqlgitdockerlinuxkafkacompliancecommunicationagile

Work Mode

On-site

Job Type

Full Time

Job Description

Job Summary: We are looking for a Senior Data Engineer with deep expertise in Python , Apache Spark , and Apache Airflow to design, build, and optimize scalable data pipelines and processing frameworks. You will play a key role in managing large-scale data workflows, ensuring data quality, performance, and timely delivery for analytics and machine learning platforms. Key Responsibilities: Design, develop, and maintain data pipelines using Apache Spark (PySpark) and Airflow for batch and near real-time processing. Write efficient, modular, and reusable Python code for ETL jobs, data validation, and transformation tasks. Implement robust data orchestration workflows using Apache Airflow (DAGs, sensors, hooks, etc.). Work with big data technologies on distributed platforms (e.g., Hadoop, AWS EMR, Databricks). Ensure data integrity, security, and governance across various stages of the pipeline. Monitor and optimize pipeline performance; resolve bottlenecks and failures proactively. Collaborate with data scientists, analysts, and other engineers to support data needs. Document architecture, processes, and code to support maintainability and scalability. Participate in code reviews, architecture discussions, and production deployments. Mentor junior engineers and provide guidance on best practices. Required Skills: 8+ years of experience in data engineering or backend development roles. Strong proficiency in Python , including data manipulation (Pandas, NumPy) and writing scalable code. Hands-on experience with Apache Spark (preferably PySpark) for large-scale data processing. Extensive experience with Apache Airflow for workflow orchestration and scheduling. Deep understanding of ETL/ELT patterns , data quality, lineage, and data modeling. Familiarity with cloud platforms (AWS, GCP, or Azure) and related services (S3, BigQuery, Redshift, etc.). Solid experience with SQL , NoSQL, and file formats like Parquet, ORC, and Avro. Proficient with CI/CD pipelines , Git, Docker, and Linux-based development environments. Preferred Qualifications: Experience with data lakehouse architectures (e.g., Delta Lake, Iceberg). Exposure to real-time streaming technologies (e.g., Kafka, Flink, Spark Streaming). Background in machine learning pipelines and MLOps tools (optional). Knowledge of data governance frameworks and compliance standards. Soft Skills: Strong problem-solving and communication skills. Ability to work independently and lead complex projects. Experience working in agile and cross-functional teams. Show more Show less

Shivsys Inc.
Not specified
No locations

Employees

8 Jobs

RecommendedJobs for You

Hyderabad, Telangana, India

Itanagar, Arunachal Pradesh, India