About Us is a leading global digital product development firm. We unify strategy, design and technology with continuous growth-centric digital product engineering solutions for F500 companies and global brands, including Bell Telecom, Bausch Health (Previously Valeant Pharma), Honda Motors, AES Corp, Thomson Reuters Carswell, First American and Colliers International.RoleOur passion is empowering innovators and change-makers at every level of the product life cycle. We specialize in building customized business apps that allow organizations and enterprises to improve their efficiency, collaboration and user the Role :
We are seeking a Senior Cloud Engineer to join our Infrastructure & Platform Engineering team. This role is critical to managing secure, scalable, and high-performance cloud infrastructure while embedding automation, AI-driven service optimization, and regulatory compliance. Youll work across production and non-production environments, collaborating with development, QA, and service teams to ensure smooth, secure, and efficient platform operations.
Cloud Infrastructure & Automation
This is an exciting opportunity to work at the intersection of cloud engineering, AI enhanced service management, and DevOps best practices in a mission-critical Youll Do :
- Design, build, and manage AWS/Azure/GCP-based infrastructure using Terraform, CloudFormation, and IaC pipelines.
- Manage compute, networking, load balancing, identity, and scaling for enterprise grade
workloads.
- Deploy and manage virtualized services to ensure uptime, high availability, and & CI/CD Integration :
- Build and maintain CI/CD pipelines using GitHub Actions, Jenkins, GitLab CI.
- Support serverless, containerized (Docker, Kubernetes, EKS/ECS) and event-driven platforms.
- Automate environment provisioning, application deployments, monitoring, and alerts.
IT Service Management (ITSM) & Incident Handling
- Manage and streamline incident, problem, and change workflows using tools like ServiceNow, Azure DevOps, or Jira Service Management.
- Lead incident response and root cause analysis (RCA); define preventive measures and SLAs to maintain system reliability.
- Maintain detailed operational documentation and runbooks for high-availability support Service Optimization & Analytics :
- Build and integrate AI-enhanced tooling (e.g., ChatGPT, predictive analytics, or self-healing bots) to accelerate service delivery and reduce resolution time.
- Use AI/ML for intelligent alerting, auto-remediation, and anomaly detection across observability tools.
- Analyze system performance and ITSM data to generate actionable insights using data visualization platforms (e.g., Power BI, Tableau, Risk, and Compliance :
- Implement secure coding, encryption, network segmentation, IAM best practices, and Zero Trust principles.
- Ensure compliance with SOC2, GDPR, ISO 27001, DORA, and internal InfoSec standards.
- Conduct regular system audits, SAST/SCA scans, and support regulatory & Continuous Improvement :
- Act as a liaison between infra, dev, and QA teams to ensure seamless integration and
deployment.
- Promote a continuous improvement culture, identifying automation opportunities and driving operational efficiencies.
- Mentor junior team members on best practices in infrastructure-as-code, observability, and secure You Bring :
- 5+ years of experience managing cloud-based infrastructure and enterprise IT environments.
- Strong skills in automation, scripting, and CI/CD pipelines.
- Hands-on experience with ITSM tools and AI-powered service delivery enhancements.
- Working knowledge of regulatory compliance frameworks (SOC2, GDPR, ISO 27001, DORA).
- Demonstrated ability in incident response, RCA, and preventive measures.
- Strong analytical and diagnostic abilities to troubleshoot complex cloud issues.
- Excellent cross-functional communication and collaboration skills.
- Bachelors degree in computer science, Engineering, or a related field
Nice To Have
- Certifications : AWS Certified Solutions Architect / DevOps Engineer / Security Specialist
- Exposure to self-healing architectures, anomaly detection, or AI-based automation agents
- Experience integrating observability with ITSM workflows for automated post incident insights
- Familiarity with edge computing, serverless frameworks, and VPC/networking optimization
(ref:hirist.tech)