Jobs

Interviews
Job Alerts
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Home
>
Jobs in gurugram
>
LEAPWORK
>
Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)

LEAPWORK

5 - 8 years

16 - 25 Lacs

gurugram

Posted:3 months ago| Platform:

Apply

Skills Required

site reliability engineering devops azure devops azure kubernetes service terraform docker azure functions powershell sre iac azure monitoring kubernetes

Work Mode

Hybrid

Job Type

Full Time

Job Description

At Leapwork, our vision is to break down the barriers between humans and computers through the worlds most accessible automation platform. We are the leading global AI-powered visual test automation solution, enabling some of the worlds largest enterprises to adopt, scale, and maintain automation – in under 30 days.

In today’s environment, where efficiency, automation, and cost optimization are essential to enterprise growth, we are uniquely positioned to deliver impact.

In 2023, Microsoft, the world’s largest and most recognizable software company, recognised Leapwork as a truly innovative and disruptive product, leading to a strategic partnership that continues to be a major growth catalyst.

If you're contemplating the next step in your career and seek a fast-paced company where you can impact the build and growth of something truly special, look no further!

We are headquartered in Copenhagen, Denmark and have local offices across Europe, the US and Asia.

We are looking for an experienced and forward-thinking Senior Site Reliability Engineer (SRE) with deep expertise in Microsoft Azure Cloud. In this role, you will ensure the reliability, availability, scalability, and performance of our Azure-based platforms and applications.

You will partner with cross-functional teams to design, implement, and maintain resilient infrastructure while driving automation, monitoring, and optimization initiatives across our cloud environment.

Role Responsibilities:

Service Reliability & SLOs:
Define and maintain
Service Level Objectives (SLOs)
for the systems you own. Continuously measure and improve availability, latency, and overall system health.
Automation & Scalability:
Develop automation to scale systems sustainably, prevent service issues, and enable rapid recovery when incidents occur.
Collaboration & Architecture Influence:
Partner with development teams to improve
reliability, observability, and release velocity
. Influence architectural decisions to embed high availability and operability into applications.
Incident Management:
Participate in on-call rotations, lead incident response, conduct postmortems, and drive root cause resolution with a focus on prevention.
Monitoring & Observability:
Implement and refine monitoring, alerting, and observability solutions (Azure Monitor, Datadog, Grafana, Prometheus, Loki, Tempo) to ensure proactive detection of issues.
Disaster Recovery & BCP:
Design, test, and maintain disaster recovery and business continuity strategies to safeguard system availability and data integrity.
Cost Optimization:
Monitor and optimize Azure resource usage for
performance and cost efficiency
.
Engineering Best Practices:
Be a vocal advocate for strong engineering practices, enabling scalable, reliable, and performant systems.
Cloud Migration Enablement:
Support cloud migration initiatives in partnership with foundation and migration teams — from architectural reviews to operational acceptance testing and configuring Grafana dashboards and Azure Log Analytics metrics.
AI & Intelligent Automation:
Leverage AI/ML-driven tools to improve system observability, incident prediction, and automated remediation, ensuring faster recovery and reduced downtime.
SRE Agents:
Work with or build
SRE Agents
to automate routine operational tasks such as log analysis, anomaly detection, incident triage, and performance tuning.
Data-Driven Reliability:
Analyse monitoring data using AI/ML to identify hidden trends, optimize system health, and drive continuous improvement in reliability practices.
Documentation & Knowledge Sharing:
Maintain detailed documentation of systems, processes, and architecture to ensure alignment and smooth onboarding of team members.
Continuous Learning:
Actively participate in and foster a
culture of continuous learning and development
within the team.
Mentorship:
Guide and mentor junior engineers, promoting collaboration and technical growth

Technical Qualifications/ Role Requirements (Must - Have Skills)

Bachelor’s degree in computer science, Engineering, or a related technical field. Master's degree is a plus.
Proven experience (7+ years) working as an SRE with a specific focus on Microsoft Azure Cloud services.
Deep understanding of Azure services, including Azure Kubernetes Service (AKS), Azure App Service, Azure Functions, Azure Monitor, and Azure Resource Manager.
Proficiency in scripting and programming languages (e.g., PowerShell, Python) for automation, infrastructure management, and tool development.
Hands-on experience with containerization and orchestration technologies, such as Docker and Kubernetes, in an Azure context.
Strong incident management skills, with a data-driven and analytical approach to diagnosing complex issues.
Familiarity with Infrastructure as Code (IaC) tools (e.g., Terraform, ARM templates) and configuration management tools (e.g., Ansible, Chef, Puppet).
Familiarity with
AI-powered
monitoring, anomaly detection, and auto-remediation tools.
Experience working with
SRE Agents
or similar intelligent automation frameworks for operational efficiency.
Ability to integrate AI-driven insights into
incident response, root cause analysis, and reliability engineering
Excellent problem-solving skills, attention to detail, and a proactive attitude towards addressing operational challenges.
Effective communication and collaboration skills, with the ability to work across teams and influence technical decisions.
Experience with CI/CD pipelines and version control systems (e.g., Git).
Relevant Azure certifications (e.g., Microsoft Certified: Azure Solutions Architect Expert, Microsoft Certified: Azure DevOps Engineer Expert) are highly advantageous.
In-depth knowledge of monitoring and alerting tools like Grafana, Prometheus, Loki, and Tempo.
Analyze monitoring data to identify trends and root causes of incidents, leading to continuous improvement of system health.
A strong understanding of DevOps principles and automation practices.

Why Leapwork?

We are on an exciting journey of global growth – and this is your chance to get onboard and an opportunity to lead and shape digital transformation initiatives in a forward-thinking company, working with and learning from a talented and passionate team committed to innovation and excellence

By joining our team, you’ll become part of a fast-paced international environment where you can grow, challenge yourself, and do what inspires you. We work hard, but have fun while doing it – and we believe that collaboration, social activities and celebration are keys to success.

Our Leapwork principles:

Our five key principles capture the essence of what it means to be a part of our world-class team! They are integral to how we approach our work and one another, and they serve as a roadmap to our continued growth, development, achievements, and success.

Customer first;

Lead from the front;

Get it done;

Build excellence;

Respectfully different;

More Jobs at LEAPWORK

Sr. QA Engineer

Gurgaon

3.0 - 8.0 yrs

INR 5 - 7 Lacs

Automation Engineer

Gurgaon

3.0 - 5.0 yrs

INR 4 - 7 Lacs

Site Reliability Engineer (SRE)

gurugram

5.0 - 8.0 yrs

INR 16 - 25 Lacs

Automation Architect (US shift - Timings 10pm To 6am)

gurugram

8.0 - 12.0 yrs

INR 20 - 35 Lacs

Site Reliability Engineer (SRE)

gurgaon

7.0 - 7.0 yrs

INR 4 - 9 Lacs

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

LEAPWORK

Software Development

København K Capital Region

Before You Leave... Find Your Perfect Job!

Login to

Please Verify Your Phone or Email

Confirm Action

Contact Us

Site Reliability Engineer (SRE)

Experience & Salary

Skills Required

Work Mode

Job Type

Job Description

We are looking for an experienced and forward-thinking Senior Site Reliability Engineer (SRE) with deep expertise in Microsoft Azure Cloud. In this role, you will ensure the reliability, availability, scalability, and performance of our Azure-based platforms and applications.

Role Responsibilities:

Service Reliability & SLOs:

Service Level Objectives (SLOs)

Automation & Scalability:

Collaboration & Architecture Influence:

reliability, observability, and release velocity

Incident Management:

Monitoring & Observability:

Disaster Recovery & BCP:

Cost Optimization:

performance and cost efficiency

Engineering Best Practices:

Cloud Migration Enablement:

AI & Intelligent Automation:

SRE Agents:

SRE Agents

Data-Driven Reliability:

Documentation & Knowledge Sharing:

Continuous Learning:

culture of continuous learning and development

Mentorship:

Technical Qualifications/ Role Requirements (Must - Have Skills)

AI-powered

SRE Agents

incident response, root cause analysis, and reliability engineering

Why Leapwork?

Our Leapwork principles:

Customer first;

Lead from the front;

Get it done;

Build excellence;

Respectfully different;

More Jobs at LEAPWORK