As a Senior Cloud Platform Engineer, you will be an individual contributor and subject matter expert who maintains and participates in the design and implementation of technology solutions. The engineer will collaborate within a team of technologists to produce enterprise scale solutions for our clients needs. This professional will be working with the latest Amazon Web Services technologies around cloud architecture, infrastructure automation, and network security.
What you'll do:
- Design and implement automated disaster recovery (DR) solutions for cloud-native and hybrid applications hosted on AWS
- Develop and maintain infrastructure-as-code (IaC) scripts (e.g., Terraform, CloudFormation) to provision DR environments across multiple AWS regions
- Build automated failover and recovery workflows using various AWS services such as Lambda, Step Functions, EC2, RDS, Route 53, and S3
- Collaborate with application teams and other infrastructure teams to maintain defined Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO), and translate them into technical DR strategies
- Integrate DR automation into CI/CD pipelines to ensure DR readiness is continuously validated, participate in periodic disaster recovery exercises analyze results, and implement improvements to reduce recovery time and increase reliability
- Partner closely with cross-functional leaders (development/product management/stakeholders/ etc.) to ensure a clear understanding of business and technical needs; jointly select the best strategy after evaluating the benefits and costs associated with different approaches
- Provide technical leadership and mentorship to junior engineers on DR best practices and automation techniques
- Architect cloud solutions using industry-leading DevOps best practices and technologies
- Identify, test prototype solution and lead proof of concepts initiatives as per need
- Utilize existing security tools, mitigate vulnerabilities, devise testing strategies and automate repeatable tasks along with participating and assisting in root cause analysis activities
- Closely collaborate with implementation teams to ensure understanding and utilization of the most optimal approach
- Help develop architectural standards and guidelines for scalability, performance, resilience, and efficient operations while adhering to necessary security and compliance standards
What you'll bring:
- Bachelor's/Master's degree in Computer Science, Information Technology or a related discipline with significant programming experience
- 3 to 6+ years in a Cloud Engineering or Infrastructure as Code or DevOps role, deploying and maintaining SaaS applications
- 2+ years experience with AWS cloud technologies and at least one AWS certification is required (Solution Architect / DevOps Engineer)
- 1+ years experience functioning as a senior member in an infrastructure/software team
- Hands-on experience with AWS services like Lambda, S3, RDS, EMR, Terraform, CloudFormation, CodeBuild, Config, Systems Manager, ServiceCatalog, Lambda, etc.
- Full-stack IT experience with *nix, Windows, network/firewall concepts, source control and build/dependency management and continuous integration systems (TeamCity or Jenkins)
- Strong experience building automation using scripting languages like Bash//Python/PowerShell
- Experience working and contributing to software applications release/deployment management processes in a SaaS model
- Experience conducting DR drills and automated recovery simulations
- Experience in architecting and implementing cloud-based solutions deployment pipeline with robust Business Continuity and Disaster Recovery requirements
- Possess strong verbal, written and team presentation communication skills; this role requires deep and continuous collaboration with product engineering, QA and various Cloud Teams
- This role requires healthy doses of initiative and the ability to remain flexible and responsive in a very dynamic environment
- Ability to work around unknowns and develop robust automations or solutions
- Experience of delivering quality work on defined tasks with limited oversight
- Ability to quickly learn new platforms, languages, tools, and techniques as needed to meet project requirements