MLOps Engineer (AWS/Azure)
Role Type:
Location:
Hours:
Start Date:
End Date:
Day Rate:
Process:
1. ICS first Interview (30 minutes).
2. CV shared with the client.
3. Meet the client and complete a technical interview/assessment.
Context of the client
- The client is a global energy company undergoing a significant transformation to support the energy transition.
- We work within their Customers & Products (C&P) division, serving both B2C and B2B customers across key markets, including the UK, US, Germany, Spain, and Poland.
- This business unit includes mobility (fuel and EV), convenience retail, and loyalty.
Context of the ICS:
At The Institute of Clever Stuff (ICS), we don’t just solve problems... we revolutionise results. Our mission is to empower a new generation of Future Makers today, to revolutionise results and create a better tomorrow. Our vision is to pioneer a better future together. We are a consulting firm with a difference, powered by AI, driving world-leading results from data and change. We partner with visionary organisations to solve their toughest challenges, drive transformation, and deliver high-impact results.
fortu.ai
Essential Requirements
9+ years
of relevant professional experience, including 5+ years
in platform engineering, designing, deploying, and managing scalable, secure cloud infrastructure across both Azure and AWS.- Strong grounding in
governance, audit, observability, and compliance
for cloud-based GenAI/ML ecosystems. - Proven experience setting up and managing
CI/CD
using Azure DevOps
or AWS CodePipeline
. - Proficiency with
infrastructure‑as‑code
(ARM/Bicep, Terraform, CloudFormation, CDK) and containerisation
(Docker, Kubernetes). - Advanced understanding of
networking
(DNS, load balancing, VPNs, VNets/VPCs) and security
(IAM, RBAC, policies, SCPs). - Solid programming skills in
Python
plus scripting (Bash
, PowerShell
); familiarity with mainstream AI/ML libraries
(TensorFlow, PyTorch, scikit‑learn). - Experience with
cloud data stores
and key management (Azure Blob, Cosmos DB, SQL, Key Vault; AWS S3, DynamoDB, RDS/KMS) and their integrations with AI services.
Core Technical Expertise (Must Have):
Azure & AWS ML/AI services:
Azure ML, Azure AI Services, Azure AI Search; AWS SageMaker, AWS Bedrock, AWS Lambda.GenAI & Agentic ecosystems:
Exposure to Generative AI and Agentic AI ecosystems, such as Azure OpenAI, Azure AI Foundry/Hub, Bedrock, Anthropic Claude, OpenAI API, LlamaCloud, LangChain.Security & identity:
Azure Policy, Azure RBAC, AWS IAM, AWS SCPs; audit logging; least‑privilege design.IaC & platform automation:
ARM/Bicep, Terraform, CloudFormation, CDK.DevOps/CI‑CD:
Azure DevOps or AWS CodePipeline; integration and delivery for data science and ML workflows.Data & storage:
Azure Blob/Cosmos/SQL/Key Vault; AWS S3/DynamoDB/RDS; understanding of OLTP and OLAP patterns.Containers & orchestration:
Docker and Kubernetes (including AKS/EKS patterns and ECR/ACR usage).Monitoring & observability:
Grafana, Prometheus, Azure Monitor, Application Insights, Log Analytics Workspaces.Networking:
DNS management, load balancing, VPNs, virtual networks (VNets/VPCs).Testing:
Unit and integration testing as part of CI/CD (ideally on Azure DevOps).ML tooling:
Azure ML Studio, Python SDK (v2), CLI (v2) for monitoring, retraining, and redeployment.AI safety & evaluation:
Token usage comprehension; prompt injection/jailbreak risks and mitigations; Azure AI Evaluation SDK; AI red‑teaming prompt security scans.
Working Methods:
- Agile, sprint‑based delivery with
Azure DevOps
(boards, repos, pipelines). - Strong DevOps and
CI/CD
pipeline management across environments. - Close collaboration with Data Scientists, Data Analysts, Software Engineers, and platform teams.
- Clear documentation and communication suited to distributed teams.
- Stakeholder engagement to troubleshoot ML pipeline issues and support modelling infrastructure needs.
Beneficial Experience:
Developer productivity:
GitHub Copilot, Cursor, Claude Code.Microsoft/Azure services:
Azure Bot Framework, API Management, Application Gateway, M365 Copilot
.AWS SDKs & tooling:
Boto3
, AWS CDK.Notebooks & experimentation:
Jupyter Notebook.ML frameworks:
PyTorch, TensorFlow, scikit‑learn; practical E2E ML workflow design.
Responsibilities
Platform & Infrastructure
- Design, deploy, and manage scalable and secure cloud infrastructure across Azure and AWS using IaC (ARM/Bicep/Terraform/CloudFormation/CDK).
- Implement core networking (DNS, load balancing, VPNs, VNets/VPCs) and platform services for reliability and performance.
- Build and operate container platforms (Docker, Kubernetes; ACR/AKS and ECR/EKS patterns).
- Set up comprehensive monitoring and logging (Grafana, Prometheus, Azure Monitor, Application Insights, Log Analytics).
Security & Compliance:
- Apply the
principle of least privilege
across cloud platforms (Azure RBAC, AWS IAM) and enforce policy (Azure Policy, AWS SCPs). - Enable audit logging and controls appropriate for GenAI/ML workloads.
- Manage secrets and keys with
Azure Key Vault
and AWS KMS
.
CI/CD & Testing
- Implement CI/CD for data science/ML pipelines with Azure DevOps or AWS CodePipeline.
- Embed robust unit and integration testing in the pipeline; champion code quality and operational readiness.
Infrastructure as Code (IaC)
- Define and evolve cloud resources as code; review and maintain standards, patterns, and reusable modules.
- Use Python or TypeScript where appropriate to codify infrastructure definitions.
Cloud Services (AWS & Azure)
AWS:
RDS, DynamoDB, Redshift, Aurora; EC2 (scaling), EBS/EFS; serverless (Lambda, SQS, SNS, EventBridge, Step Functions); containers (ECR); Bedrock; SageMaker; CloudFormation (CDK); KMS.Azure:
Cosmos DB, Azure SQL (including Serverless); compute (VMs, Scale Sets); serverless (Functions, Event Grid/Hub, Queue Storage, Service Bus); container services (ACR/AKS); Azure Resource Manager (ARM)/Bicep; Azure Key Vault; Azure Machine Learning; Azure Data Lake Storage.
MLOps & Model Lifecycle
- Enable production models across the ML lifecycle (deployment, monitoring for drift, retraining, technical evaluation, and business validation).
- Implement CI/CD orchestration for data science pipelines and support model governance.
- Collaborate with stakeholders to resolve ML pipeline issues and evolve the modelling platform.