Machine Learning Operations (MLOps) Engineer (AWS/Azure)

9 years

0 Lacs

Posted:17 hours ago| Platform: Linkedin logo

Apply

Work Mode

Remote

Job Type

Contractual

Job Description

MLOps Engineer (AWS/Azure)

Role Type:

Location:

Hours:

Start Date:

End Date:

Day Rate:


Process:

1.     ICS first Interview (30 minutes).

2.    CV shared with the client.

3.    Meet the client and complete a technical interview/assessment.


Context of the client

  • The client is a global energy company undergoing a significant transformation to support the energy transition.
  • We work within their Customers & Products (C&P) division, serving both B2C and B2B customers across key markets, including the UK, US, Germany, Spain, and Poland.
  • This business unit includes mobility (fuel and EV), convenience retail, and loyalty.


Context of the ICS:

At The Institute of Clever Stuff (ICS), we don’t just solve problems... we revolutionise results. Our mission is to empower a new generation of Future Makers today, to revolutionise results and create a better tomorrow. Our vision is to pioneer a better future together. We are a consulting firm with a difference, powered by AI, driving world-leading results from data and change. We partner with visionary organisations to solve their toughest challenges, drive transformation, and deliver high-impact results.


fortu.ai


Essential Requirements

  • 9+ years

     of relevant professional experience, including 

    5+ years

     in platform engineering, designing, deploying, and managing scalable, secure cloud infrastructure across both Azure and AWS.
  • Strong grounding in 

    governance, audit, observability, and compliance

     for cloud-based GenAI/ML ecosystems.
  • Proven experience setting up and managing 

    CI/CD

     using 

    Azure DevOps

     or 

    AWS CodePipeline

    .
  • Proficiency with 

    infrastructure‑as‑code

     (ARM/Bicep, Terraform, CloudFormation, CDK) and 

    containerisation

    (Docker, Kubernetes).
  • Advanced understanding of 

    networking

     (DNS, load balancing, VPNs, VNets/VPCs) and 

    security

     (IAM, RBAC, policies, SCPs).
  • Solid programming skills in 

    Python

     plus scripting (

    Bash

    PowerShell

    ); familiarity with mainstream 

    AI/ML libraries

     (TensorFlow, PyTorch, scikit‑learn).
  • Experience with 

    cloud data stores

     and key management (Azure Blob, Cosmos DB, SQL, Key Vault; AWS S3, DynamoDB, RDS/KMS) and their integrations with AI services.


Core Technical Expertise (Must Have):

  • Azure & AWS ML/AI services:

     Azure ML, Azure AI Services, Azure AI Search; AWS SageMaker, AWS Bedrock, AWS Lambda.
  • GenAI & Agentic ecosystems:

     Exposure to Generative AI and Agentic AI ecosystems, such as Azure OpenAI, Azure AI Foundry/Hub, Bedrock, Anthropic Claude, OpenAI API, LlamaCloud, LangChain.
  • Security & identity:

     Azure Policy, Azure RBAC, AWS IAM, AWS SCPs; audit logging; least‑privilege design.
  • IaC & platform automation:

     ARM/Bicep, Terraform, CloudFormation, CDK.
  • DevOps/CI‑CD:

     Azure DevOps or AWS CodePipeline; integration and delivery for data science and ML workflows.
  • Data & storage:

     Azure Blob/Cosmos/SQL/Key Vault; AWS S3/DynamoDB/RDS; understanding of OLTP and OLAP patterns.
  • Containers & orchestration:

     Docker and Kubernetes (including AKS/EKS patterns and ECR/ACR usage).
  • Monitoring & observability:

     Grafana, Prometheus, Azure Monitor, Application Insights, Log Analytics Workspaces.
  • Networking:

     DNS management, load balancing, VPNs, virtual networks (VNets/VPCs).
  • Testing:

     Unit and integration testing as part of CI/CD (ideally on Azure DevOps).
  • ML tooling:

     Azure ML Studio, Python SDK (v2), CLI (v2) for monitoring, retraining, and redeployment.
  • AI safety & evaluation:

     Token usage comprehension; prompt injection/jailbreak risks and mitigations; Azure AI Evaluation SDK; AI red‑teaming prompt security scans.


Working Methods:

  • Agile, sprint‑based delivery with 

    Azure DevOps

     (boards, repos, pipelines).
  • Strong DevOps and 

    CI/CD

     pipeline management across environments.
  • Close collaboration with Data Scientists, Data Analysts, Software Engineers, and platform teams.
  • Clear documentation and communication suited to distributed teams.
  • Stakeholder engagement to troubleshoot ML pipeline issues and support modelling infrastructure needs.


Beneficial Experience:

  • Developer productivity:

     GitHub Copilot, Cursor, Claude Code.
  • Microsoft/Azure services:

     Azure Bot Framework, API Management, Application Gateway, 

    M365 Copilot

    .
  • AWS SDKs & tooling:

     

    Boto3

    , AWS CDK.
  • Notebooks & experimentation:

     Jupyter Notebook.
  • ML frameworks:

     PyTorch, TensorFlow, scikit‑learn; practical E2E ML workflow design.


Responsibilities


Platform & Infrastructure

  • Design, deploy, and manage scalable and secure cloud infrastructure across Azure and AWS using IaC (ARM/Bicep/Terraform/CloudFormation/CDK).
  • Implement core networking (DNS, load balancing, VPNs, VNets/VPCs) and platform services for reliability and performance.
  • Build and operate container platforms (Docker, Kubernetes; ACR/AKS and ECR/EKS patterns).
  • Set up comprehensive monitoring and logging (Grafana, Prometheus, Azure Monitor, Application Insights, Log Analytics).


Security & Compliance:

  • Apply the

    principle of least privilege

     across cloud platforms (Azure RBAC, AWS IAM) and enforce policy (Azure Policy, AWS SCPs).
  • Enable audit logging and controls appropriate for GenAI/ML workloads.
  • Manage secrets and keys with 

    Azure Key Vault

     and 

    AWS KMS

    .


CI/CD & Testing

  • Implement CI/CD for data science/ML pipelines with Azure DevOps or AWS CodePipeline.
  • Embed robust unit and integration testing in the pipeline; champion code quality and operational readiness.


Infrastructure as Code (IaC)

  • Define and evolve cloud resources as code; review and maintain standards, patterns, and reusable modules.
  • Use Python or TypeScript where appropriate to codify infrastructure definitions.


Cloud Services (AWS & Azure)

  • AWS:

     RDS, DynamoDB, Redshift, Aurora; EC2 (scaling), EBS/EFS; serverless (Lambda, SQS, SNS, EventBridge, Step Functions); containers (ECR); Bedrock; SageMaker; CloudFormation (CDK); KMS.
  • Azure:

     Cosmos DB, Azure SQL (including Serverless); compute (VMs, Scale Sets); serverless (Functions, Event Grid/Hub, Queue Storage, Service Bus); container services (ACR/AKS); Azure Resource Manager (ARM)/Bicep; Azure Key Vault; Azure Machine Learning; Azure Data Lake Storage.


MLOps & Model Lifecycle

  • Enable production models across the ML lifecycle (deployment, monitoring for drift, retraining, technical evaluation, and business validation).
  • Implement CI/CD orchestration for data science pipelines and support model governance.
  • Collaborate with stakeholders to resolve ML pipeline issues and evolve the modelling platform.

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now