We seek a top-tier Senior Cloud Engineer (AWS) to lead strategic cloud operations, drive innovation, and shape product direction. Only candidates with proven expertise in large-scale, high-spend AWS environments will be considered.
Key Responsibilities:
Cloud Operations & Architecture
- Design, secure, and optimize cloud infrastructure handling$1M+/year spendacross100+ AWS accounts.
- Automate operations using Python, CloudFormation, and Systems Manager to enable proactive scaling and cost-triggered remediations.
- Solve complex failures in distributed systems using CloudWatch, X-Ray, and native tooling.
- Implement measurable cost optimizations (e.g., "Achieved 35% savings via rightsizing/Spot adoption").
- Enforce least-privilege security, encryption, and compliance via automated audits.
- Maintain >99.9% uptime through rigorous incident management and chaos engineering.
Research & Product Leadership
- Lead research initiatives and POCs to explore new technologies and methodologies.
- Provide opinionated insights and recommendations to influence product strategy and roadmap.
- Collaborate with cross-functional teams to translate research findings into actionable product features.
Generative AI & Emerging Technologies
- Design and implement generative AI applications using Amazon Bedrock, leveraging foundation models from providers like Anthropic and AI21 Labs.
- Develop and manage agentic AI workflows utilizing Amazon Bedrock Agents, including custom orchestrators for complex task automation.
- Integrate Model Context Protocol (MCP) servers to enhance AI capabilities with domain-specific knowledge and tool access.
- Collaborate with cross-functional teams to translate research findings into actionable product features.
- Stay abreast of industry trends and emerging technologies to inform product development
Desired Skills
Must have:
- Must have worked at anAWS-certified MSPorEnterprise Cloud Center of Excellence (CCOE)serving multiple internal teams.
- Must behands-onin supporting production-grade workloads across100+ AWS accountswith>$100K/month cloud spend.
- Must have performed5+ customer assessments such as Formal Technical Reviews (FTRs)/Well-Architected Framework Reviews (WAFRs)with documented optimization results.
- Must have builtproactive/reactive automationsfor cost, security, and compliance.
- Must have usedAWS-native FinOps/SecOps tools(Security Hub, Config, Cost Explorer) or third-party equivalents.
- Proven experience in AWS cost optimization and financial operations (FinOps).
- Strong proficiency in AWS services such as EC2, S3, RDS, Lambda, Bedrock etc.
- Experience with AWS Cost Explorer, Budgets, and other cost management tools.
- Proficiency in scripting languages like Python or Bash for automation purposes.
- Strong analytical and problem-solving skills.
- Excellent communication and collaboration abilities.
- AWS certifications such as AWS Certified Solutions Architect or AWS Certified AI Practitioner are preferred.
Preferred Skills:
- Experience with FinOps tools like CloudHealth, Apptio Cloudability, or similar.
- Knowledge of cloud governance and compliance standards.
- Familiarity with DevOps practices and CI/CD pipelines.
- Experience in leading research initiatives and providing strategic product insights.
Experience:
- 8+ years of experience in cloud operations, with a strong focus on AWS services.
- Hands-on experience designing and operating production-grade infrastructure on AWS.
- Proficient in writing and managing AWS CloudFormation templates with practical experience.
- Hands-on experience using Amazon CloudWatch for monitoring, alerting, and dashboard configuration.
- Performed independent root cause analysis and troubleshooting in cloud production environments.
- Managed EC2 lifecycle, patching, and configuration using AWS Systems Manager in production setups.
- Ensured cloud security using IAM policies, encryption standards, and audit mechanisms with best practices.
- Worked directly with product or customer teams to translate business needs into scalable technical solutions.
- Led operational readiness and incident management in critical environments with proven examples.
- Experience in designing or deploying generative AI applications using Amazon Bedrock (preferred).
- Experience working with Model Context Protocol (MCP) servers or similar AI orchestration frameworks (preferred).
Education
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field.