Posted:6 days ago| Platform: SimplyHired logo

Apply

Work Mode

On-site

Job Description

Key Responsibilities

1. Reliability Engineering & Operations

  • Ensure the scalability, availability, and resilience of mission-critical systems in Azure and Kubernetes environments.

  • Build SLOs/SLIs and establish reliability best practices following SRE principles.

  • Perform root-cause analysis, incident response, post-mortems, and continuous improvement activities.

  • Automate operational tasks and reduce toil through scripting and infrastructure automation.

2. Kubernetes Platform Management

  • Deploy, configure, and manage workloads on Azure Kubernetes Service (AKS) or self-managed Kubernetes clusters.

  • Manage cluster upgrades, node pools, RBAC, secrets, ingress controllers, autoscaling, and capacity planning.

  • Implement GitOps or automated deployment workflows for Kubernetes manifests or Helm charts.

  • Optimize cluster performance, networking, and security.

3. CI/CD Pipeline Development

  • Build and maintain CI/CD pipelines using tools such as Azure DevOps, GitHub Actions, or Jenkins.

  • Implement automated testing, build pipelines, artifact management, and secure deployment workflows.

  • Integrate CI/CD with Kubernetes, container registries, and infrastructure automation.

  • Enforce DevOps best practices, including versioning, release automation, and rollbacks.

4. Monitoring, Observability & Alerting

  • Implement and maintain observability stacks using Prometheus, Grafana, Alertmanager, Loki, or similar tools.

  • Create metrics dashboards, alerts, and performance monitoring for both applications and infrastructure.

  • Develop logging, tracing, and telemetry systems for full stack visibility.

  • Monitor capacity, resource utilization, cluster health, and system performance.

5. Azure Cloud Engineering

  • Design and maintain Azure cloud infrastructure: virtual networks, VM scale sets, load balancers, storage, and identity management.

  • Implement infrastructure-as-code solutions using Terraform, Bicep, or ARM templates.

  • Ensure compliance, governance, scaling, and cost optimization across cloud resources.

  • Integrate Azure services (Key Vault, Monitor, Log Analytics, Container Registry, Service Bus, etc.) into platform operations.

Required Skills & Experience

  • 3-8+ years of experience in SRE, DevOps, Cloud Engineering, or related roles.

  • Strong hands-on experience with Kubernetes (AKS preferred) and cloud-native architectures.

  • Proficiency with Azure cloud services and infrastructure.

  • Solid experience building and maintaining CI/CD pipelines.

  • Deep knowledge of Grafana and Prometheus for monitoring and observability.

  • Strong scripting/automation skills in Bash, Python, or PowerShell.

  • Experience with containers (Docker), Git, and distributed systems.

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You

noida, uttar pradesh, india

hyderabad, telangana

mumbai, maharashtra

hyderabad, pune

pune, maharashtra, india

pune, maharashtra