Primary Skill - 3-5 years hands-on experience in Apache Airflow, Any Cloud platform (AWS/Azure/GCP), Python
Secondary:
Scope and Responsibilities:
As a Senior Engineer with a focus on Managed Airflow Platform (MAP) support engineering, you will:
- Evangelize and cultivate adoption of Global Platforms, open-source software and agile principles within the organization
- Ensure solutions are designed and developed using a scalable, highly resilient cloud native architecture
- Ensure the operational stability, performance, and scalability of cloud-native platforms through proactive monitoring and timely issue resolution
- Diagnose infrastructure and system issues across cloud environments and Kubernetes clusters, and lead efforts in troubleshooting and remediation
- Collaborate with engineering and infrastructure teams to manage configurations, resource tuning, and platform upgrades without disrupting business operations
- Maintain clear, accurate runbooks, support documentation, and platform knowledge bases to enable faster onboarding and incident response
- Support observability initiatives by improving logging, metrics, dashboards, and alerting frameworks
- Advocate for operational excellence and drive continuous improvement in system reliability, cost-efficiency, and maintainability
- Work with product management to support product / service scoping activities
- Work with leadership to define delivery schedules of key features through an agile framework
- Be a key contributor to overall architecture, framework and design of global platforms
Required Qualifications
- Bachelor's or Master's degree in Computer Science or a related field
- 3+ years of experience in large-scale production-grade platform support, including participation in on-call rotations
- 3+ years of hands-on experience with cloud platforms like AWS, Azure, or GCP
- 2+ years of experience developing and supporting data pipelines using Apache Airflow including:
- DAG lifecycle management and scheduling best practices
- Troubleshooting task failures, scheduler issues, performance bottlenecks managing and error handling
- Strong programming proficiency in Python, especially for developing and troubleshooting RESTful APIs
- Working knowledge of Node.js is considered an added advantage
- 1+ years of experience in observability using the ELK stack (Elasticsearch, Logstash, Kibana) or Grafana Stack
- 2+ years of experience with DevOps and Infrastructure-as-Code tools such as GitHub, Jenkins, Docker, and Terraform
- 2+ years of hands-on experience with Kubernetes, including managing and debugging cluster resources and workloads within Amazon EKS
- Exposure to Agile and test-driven development a plus.
- Experience delivering projects in a highly collaborative, multi-discipline development team environment
Desired Qualifications
- Experience with participating in projects in a highly collaborative, multi-discipline development team environment
- Exposure to Agile, ideally a strong background with the SAFe methodology
- Skill set on any monitoring or observability tool will be a value add.