Senior AWS Data Engineer (PySpark & Python) — On-site, India
Industry & Sector:
Leading IT services & cloud data engineering sector focused on end-to-end data platforms, analytics, and enterprise-scale ETL/ELT solutions. We deliver production-grade data pipelines, real-time streaming, and analytics integrations for large enterprise customers across finance, retail, and SaaS domains.We are hiring an experienced Data Engineer to join an on-site India team to build, optimize, and operate scalable AWS-based data platforms using Python and PySpark. This role requires 5+ years of hands-on data engineering experience and a strong operational mindset.Role & Responsibilities
- Design, develop, and maintain robust ETL/ELT pipelines on AWS (S3 → Glue/EMR → Redshift/Snowflake) using Python and PySpark.
- Implement efficient Spark jobs, optimize query/performance, and reduce pipeline latency for batch and near-real-time workflows.
- Build and manage orchestration with Apache Airflow (DAGs, sensors, SLA alerts) and integrate with monitoring/alerting.
- Author reusable data models, enforce data quality checks, and implement observability (logs, metrics, lineage).
- Collaborate with data consumers, analytics and ML teams to translate requirements into scalable data contracts and schemas.
- Apply infrastructure-as-code and CI/CD practices to deploy data platform components and automate testing/rollouts.
Skills & Qualifications
Must-Have
- 5+ years of professional data engineering experience building production pipelines with Python and PySpark/Spark.
- Proven AWS experience: S3, Glue or EMR, Redshift (or equivalent data warehouse), Lambda and IAM best-practices.
- Strong SQL skills: query tuning, partitioning, indexing and working knowledge of data warehouse architectures.
- Hands-on with orchestration tools (Apache Airflow) and experience implementing monitoring and retry/alert strategies.
- Solid software-engineering fundamentals: unit testing, code reviews, Git-based workflows and CI/CD for data apps.
- Ability to work on-site in India and collaborate cross-functionally in fast-paced delivery cycles.
Preferred
- Experience with streaming platforms (Kafka/Kinesis), schema management, and low-latency processing.
- Familiarity with Terraform/CloudFormation, containerization (Docker), and Kubernetes for data workloads.
- Background in data modeling, columnar formats (Parquet/ORC), and data governance tools.
Benefits & Culture Highlights
- Collaborative, delivery-driven culture with strong focus on technical mentorship and upskilling.
- Opportunity to work on large-scale AWS data platforms and cross-domain analytics projects.
- Competitive compensation, professional development, and a stable on-site engineering environment in India.
If you are a pragmatic, hands-on Data Engineer who thrives on building reliable AWS data platforms with Python and PySpark, we want to hear from you. Apply to join a high-performing team delivering measurable business impact.
Skills: python,aws,pyspark