Jobs

Interviews
Job Alerts
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Home
>
Jobs in india
>
Marktine It solutions
>
AWS Data Engineer (7+ Years)

AWS Data Engineer (7+ Years)

Marktine It solutions

12 years

0 Lacs

india

Posted:2 months ago| Platform:

Apply

Skills Required

aws data engineering etl spark optimization architecture solr redshift analytics design efficiency scalability processing strategies terraform github jenkins automate provisioning security monitoring logging relational oracle storage governance scheduling versioning flex auto tuning apache pyspark scala datadog kubernetes kafka iam

Work Mode

Remote

Job Type

Contractual

Job Description

Job Description: AWS Data Engineering & CI/CD Expert (Glue & EMR)

Location: Remote

Employment Type: Contract

Experience Level: 7–12 years (hands-on AWS data engineering experience)

About the Role

We are seeking an experienced AWS Data Engineering & CI/CD Expert with strong expertise in building and optimizing large-scale data pipelines on AWS Glue and Amazon EMR. The role involves designing and implementing efficient ETL/ELT workflows, setting up CI/CD pipelines for Spark-based jobs, and providing guidance on cost optimization and architecture trade-offs between Glue and EMR.

You will be a key contributor in modernizing our data platform that processes millions of records daily from S3 → Spark (Glue/EMR) → downstream systems (e.g., Solr, Redshift, analytics platforms).

Key Responsibilities

• Architecture & Optimization

• Design, build, and optimize large-scale ETL/ELT pipelines using AWS Glue and Amazon EMR.

• Evaluate and recommend the right compute platform (Glue vs EMR vs EMR Serverless) for different workloads with a focus on cost efficiency, scalability, and performance.

• Implement incremental processing strategies (CDC, Iceberg/Hudi, partitioning, bookmarks) to minimize compute costs.

• CI/CD & Automation

• Build and maintain CI/CD pipelines (Terraform, GitHub Actions, CodePipeline, Jenkins, or equivalent) for deploying Glue jobs, EMR clusters, and Spark applications.

• Automate infrastructure provisioning using Terraform or AWS CDK with best practices in security and compliance.

• Set up monitoring, logging, and alerting for job performance, failures, and cost anomalies.

• Data Platform Integration

• Enable CDC pipelines from relational databases (e.g., DB2, Oracle) into S3, Redshift, or Solr.

• Work closely with downstream systems (Solr, Redshift, OpenSearch) to ensure fast and consistent indexing.

• Optimize file formats and storage layouts (Parquet, ORC, partitioning, compaction).

• Governance & Cost Control

• Define standards for job scheduling, versioning, and rollback.

• Set up budgets and cost monitoring for Glue/EMR workloads, ensuring cost-per-run visibility.

• Conduct periodic architecture reviews and suggest optimizations (e.g., Flex Jobs, Spot instances, auto-scaling).

Required Skills & Experience

• Strong expertise with AWS Glue (ETL, Flex Jobs, bookmarks, Spark tuning) and Amazon EMR (managed clusters, EMR Serverless, Spot instances).

• Hands-on experience with Apache Spark (PySpark/Scala) in production at scale.

• Proficiency in Terraform (or AWS CDK/CloudFormation) for IaC.

• Experience with CI/CD pipelines (GitHub Actions, Jenkins, AWS CodePipeline/CodeBuild).

• Familiarity with CDC frameworks (AWS DMS, Debezium, Hudi/Iceberg) for incremental ingestion.

• Strong knowledge of S3 best practices (Parquet/ORC, partitioning, compaction).

• Experience integrating with downstream systems like Solr, Redshift, OpenSearch.

• Solid background in cost optimization, job scheduling, and monitoring (CloudWatch, Prometheus, Datadog, etc.).

Preferred Qualifications

• Experience with Kubernetes (EKS) Spark on K8s.

• Exposure to streaming pipelines (Kafka/MSK, Kinesis).

• Familiarity with data governance and security (IAM, Lake Formation).

• AWS Certified Big Data Specialty or Solutions Architect.

Mock Interview

Practice Video Interview with JobPe AI

Start PySpark Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.