Site Reliability Engineer (SRE) With Azure & AI

6 years

0 Lacs

Posted:5 days ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Contractual

Job Description

Job Details:

Job Title: Site Reliability Engineer (SRE) With Azure & AI

Duration: Contract Position (On the Payroll of Datum Technology Group)

Location: Chennai || Mumbai || Gurugram

Interview Process: Virtual (2 Rounds) + 1 Technical screening.


  • We are seeking a skilled and collaborative Site Reliability Engineer (SRE) with deep expertise in Azure cloud hosting, AI infrastructure, and automation.
  • The ideal candidate will have hands-on experience managing cloud environments using GitHub/Azure DevOps lifecycle, and a strong understanding of AI model deployment and scaling.
  • You will work closely with a team of engineers to ensure reliable, secure, and scalable infrastructure for AI workloads and enterprise applications.


Key Responsibilities

  • Design, build, and maintain scalable cloud infrastructure on Microsoft Azure.
  • Automate infrastructure provisioning and deployment using Terraform, Argo, and Helm.
  • Manage and optimize Azure Kubernetes Service (AKS) clusters for AI and microservices workloads.
  • Support hosting of AI models using frameworks like Huggingface Transformers, vLLM, or Llama.cpp on Azure OpenAI, VMs, or GPUs.
  • Implement CI/CD pipelines using GitHub Actions and integrate with JFrog Artifactory.
  • Monitor system performance and reliability using Grafana and proactively address issues.
  • Collaborate with software engineers to ensure infrastructure supports application needs.
  • Ensure compliance with networking and information security best practices.
  • Manage caching and data layer performance using Redis.


Required Skills & Technologies

Core to Role:

  • Azure Cloud Services (including Azure OpenAI)
  • AI Model Hosting & Infrastructure Knowledge
  • GitHub (CI/CD, workflows)
  • Azure Kubernetes Service (AKS)
  • Argo, Helm
  • Terraform
  • Docker
  • JFrog
  • Grafana
  • Networking & Security
  • Redis


Qualifications

  • Bachelor's or master's degree in computer science, Engineering, or related field.
  • 6+ years of experience in SRE, DevOps, or Cloud Infrastructure roles .
  • Proven experience with AI infrastructure and model deployment.
  • Strong communication and teamwork skills.

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You