Posted:3 weeks ago|
Platform:
Remote
Contractual
Job Description: AWS Data Engineering & CI/CD Expert (Glue & EMR)
Location: Remote
Employment Type: Contract
Experience Level: 7–12 years (hands-on AWS data engineering experience)
About the Role
We are seeking an experienced AWS Data Engineering & CI/CD Expert with strong expertise in building and optimizing large-scale data pipelines on AWS Glue and Amazon EMR. The role involves designing and implementing efficient ETL/ELT workflows, setting up CI/CD pipelines for Spark-based jobs, and providing guidance on cost optimization and architecture trade-offs between Glue and EMR.
You will be a key contributor in modernizing our data platform that processes millions of records daily from S3 → Spark (Glue/EMR) → downstream systems (e.g., Solr, Redshift, analytics platforms).
Key Responsibilities
• Architecture & Optimization
• Design, build, and optimize large-scale ETL/ELT pipelines using AWS Glue and Amazon EMR.
• Evaluate and recommend the right compute platform (Glue vs EMR vs EMR Serverless) for different workloads with a focus on cost efficiency, scalability, and performance.
• Implement incremental processing strategies (CDC, Iceberg/Hudi, partitioning, bookmarks) to minimize compute costs.
• CI/CD & Automation
• Build and maintain CI/CD pipelines (Terraform, GitHub Actions, CodePipeline, Jenkins, or equivalent) for deploying Glue jobs, EMR clusters, and Spark applications.
• Automate infrastructure provisioning using Terraform or AWS CDK with best practices in security and compliance.
• Set up monitoring, logging, and alerting for job performance, failures, and cost anomalies.
• Data Platform Integration
• Enable CDC pipelines from relational databases (e.g., DB2, Oracle) into S3, Redshift, or Solr.
• Work closely with downstream systems (Solr, Redshift, OpenSearch) to ensure fast and consistent indexing.
• Optimize file formats and storage layouts (Parquet, ORC, partitioning, compaction).
• Governance & Cost Control
• Define standards for job scheduling, versioning, and rollback.
• Set up budgets and cost monitoring for Glue/EMR workloads, ensuring cost-per-run visibility.
• Conduct periodic architecture reviews and suggest optimizations (e.g., Flex Jobs, Spot instances, auto-scaling).
Required Skills & Experience
• Strong expertise with AWS Glue (ETL, Flex Jobs, bookmarks, Spark tuning) and Amazon EMR (managed clusters, EMR Serverless, Spot instances).
• Hands-on experience with Apache Spark (PySpark/Scala) in production at scale.
• Proficiency in Terraform (or AWS CDK/CloudFormation) for IaC.
• Experience with CI/CD pipelines (GitHub Actions, Jenkins, AWS CodePipeline/CodeBuild).
• Familiarity with CDC frameworks (AWS DMS, Debezium, Hudi/Iceberg) for incremental ingestion.
• Strong knowledge of S3 best practices (Parquet/ORC, partitioning, compaction).
• Experience integrating with downstream systems like Solr, Redshift, OpenSearch.
• Solid background in cost optimization, job scheduling, and monitoring (CloudWatch, Prometheus, Datadog, etc.).
Preferred Qualifications
• Experience with Kubernetes (EKS) Spark on K8s.
• Exposure to streaming pipelines (Kafka/MSK, Kinesis).
• Familiarity with data governance and security (IAM, Lake Formation).
• AWS Certified Big Data Specialty or Solutions Architect.
Marktine It solutions
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Salary: Not disclosed
Salary: Not disclosed