Jobs

Interviews
Job Alerts
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Home
>
Jobs in Hyderabad
>
Microsoft
>
Service Engineer

Service Engineer

Microsoft

5 years

0 Lacs

Hyderabad Telangana India

Posted:2 months ago| Platform:

Apply

Skills Required

service resolve reliability azure data engineering management empathy drive restore support addition communication development automation metrics agility software triage coordination leadership auto remediation strategies documentation design architecture effectiveness scalability collaboration aws gcp microservices containerization monitoring datadog splunk orchestration kubernetes docker code terraform ai ml powershell python itil tuning linux developer technology analysis correlation certifications devops regulations

Work Mode

On-site

Job Type

Full Time

Job Description

Are you passionate about cloud computing, obsessed with customer experience, and driven to resolve complex issues under pressure? Do you thrive in high-stakes, live environments and want to play a pivotal role in ensuring the reliability of Microsoft’s cloud platform? If so, the Azure Customer Experience (CXP) team has the opportunity for you.Microsoft Azure is one of the most exciting and strategic products at Microsoft—powering mission-critical workloads for enterprises, governments, and startups around the world. Azure delivers on-demand, hyper-scale infrastructure and platforms via Microsoft's global data centers, enabling customers to build, host, and scale their applications with confidence.The Customer Reliability Engineering (CRE) team within Azure CXP is a top-level pillar of Azure Engineering responsible for world-class live-site management, customer reliability engagements, modern customer-first experiences for scale, and drives deep customer insights and empathy into the broader Azure Engineering organization. Our “no dead-end’s” philosophy ensures that every customer, regardless of size or scale, can realize their full potential through the Microsoft CloudWe are seeking decisive and experienced Service Engineers with proven incident and crisis management experience. These engineers will manage Live Site issues, drive Problem Management, and enhance customer reliability.The ideal candidate will possess deep technical expertise in Azure Core Services and their intricate interdependencies, coupled with a proven ability to manage complex, highly available services on a scale. As the single point of command and control during high-severity incidents, you will orchestrate cross-functional engineering, operations, and communications to swiftly restore services, minimize impact, and safeguard the trust of our global customer baseYou will work closely with Customers, First Parties, Customer Support, Livesite, and Engineering teams to deliver critical, customer-facing features. Success in this role requires the ability to influence and collaborate across many Azure servicing teams to ensure customer needs are met. You’ll be surrounded by elite developers, data scientists, and customer-obsessed engineers who care deeply about continuous improvement and resilient cloud operations.In addition, this role includes on-call responsibilities for managing and resolving complex multi-service outages. It requires the ability to remain effective under pressure, apply broad technical and analytical skills, and coordinate seamlessly with internal service teams and stakeholders. Strong communication skills—both written and verbal—are essential. You will also lead the evolution of Azure's Incident Management practice through Post-Incident Reviews, process development, and system automation. By leveraging telemetry and metrics, you will identify and drive platform-wide improvements with global impact. You’ll be the single point of command and control during high-severity incidents, orchestrating cross-functional engineering, operations, and communications to minimize impact, restore services quickly, and protect the trust of our global customer base. This role offers a unique opportunity to make immediate impact, improve systems at scale.

Responsibilities

To be successful in this role, you must have a great track record of customer compassion, an engineering mindset, an innate aptitude for agility, and technical excellence in software engineering. Collaborate closely with Engineering/PM to ensure the availability, performance of Live Site and the satisfaction of our customers

Manage high-severity incidents (SEV0/SEV1/SEV2) across Azure services, serving as the single point of accountability to ensure rapid detection, triage, resolution, and customer communication.
Act as the central authority during live site incidents, driving real-time decision-making and coordination across Engineering, Support, PM, Communications, and Field teams.
Participate in the on-call rotation.
Provide calm, decisive leadership in crisis situations, escalating as needed to senior leadership.
Promote a customer-first culture by prioritizing availability, reliability, and platform trust in every response.
Contribute in analyzing customer-impacting signals from telemetry, support cases, and feedback to identify root causes, drive incident reviews (RCAs/PIRs), and implement preventative service improvements.
Contribute to Azure platform improvements by incorporating learnings from live site events and customer feedback, ensuring improved reliability, observability, and supportability.
Collaborate closely with Engineering and Product teams to influence and implement service resiliency enhancements, auto-remediation tools, and customer-centric mitigation strategies.
Identify and advocate for customer self-service capabilities, improved documentation, and scalable solutions that empower customers to resolve common issues independently.
Contribute to the development and adoption of incident response playbooks, mitigation levers, and operational frameworks aligned to real-world support scenarios and strategic customer needs
Contribute to the design of next-generation architecture for cloud infrastructure services with a focus on reliability and strategic customer support outcomes.
Build and maintain cross-functional partnerships, ensuring alignment across engineering, business, and support organizations.
Be data-driven and results-focused, using metrics to evaluate incident response effectiveness and platform health.
Apply engineering mindset to operational challenges, balancing agility, scalability, and technical quality in collaboration with peers
Demonstrate strong collaboration and results-focused execution under pressure while working closely with other teams.

Qualifications

Required Qualifications

5+ years’ proven expertise in mission-critical cloud operations, high-severity incident response, SRE, or large-scale systems engineering on hyperscale platforms like Azure, AWS, or GCP.
Must have Service Engineering experience in a 24 x 7 x 365 enterprise environments 
Exceptional command-and-control communication skills—able to drive clarity and direction with customers - internal Microsoft stake holders and third-party vendors during ambiguity and chaos.
Deep understanding of cloud architecture patterns, microservices, and containerization.
Demonstrated ability to make decisions quickly, under pressure, and with limited data—without compromising long-term reliability.
Familiarity with monitoring and observability tools (e.g., Grafana, Prometheus, Datadog, Splunk, New Relic).
Contribute to Implement observability frameworks to proactively detect performance bottlenecks.
Strong knowledge of CI/CD pipelines, container orchestration (Kubernetes, Docker), and infrastructure as code (Terraform, ARM, Bicep).
Familiarity with AI/ML frameworks and cloud AI services.
Experience implementing AI-driven monitoring, alerting, and remediation systems
Fluency in one or more automation languages (PowerShell, Python, CLI etc.) 
Understanding ITIL or other incident management frameworks is a must.
Understand High Availability, Disaster Recovery, Business Continuity, Performance Tuning
Demonstrates strategic thinking, quantitative and analytical skills, team leadership, and collaboration 
Excellent problem resolution, judgment, negotiating and decision-making skills
Desired Strong knowledge of Windows Platform or Linux, developer tools and ability to diagnose and debug user code
Effectively manage and prioritize multiple tasks in accordance with high level objectives/projects. 
Excellent communication skill (written + verbal) in English, especially in high-pressure scenarios.
Ability to communicate with a variety of audiences; including high-profile customers, executive management, and engineering teams.
Experience with Azure, AWS, or GCP core services and their interdependence.
Bachelor’s or master’s degree in computer science, Information Technology or equivalent experience

Preferred Qualifications

8+ Years of demonstrated experience as an Incident Commander or Crisis Manager for critical, high-severity incidents in high-availability, distributed environments.
Experience with SRE (Site Reliability Engineering) principles and practices.
Exposure to chaos engineering, fault injection, or high availability architecture.
AI/ML Experience: [Beginner to Intermediate]
Familiarity with how AI/ML models are integrated into cloud infrastructure and their potential failure modes.
Experience using AI-powered tools for incident analysis, log correlation, or predictive alerting.
An understanding of the challenges and risks associated with AI/ML systems in a production environment.
Certifications:
Relevant cloud certifications (e.g., AWS Certified DevOps Engineer, Azure Solutions Architect, GCP Professional Cloud Architect).
Certifications in ITIL, SRE, or other relevant frameworks.

Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.

More Jobs at Microsoft

Applied Scientist

Bengaluru, Karnataka

2 - 2 yrs

Salary: Not disclosed

Senior Product Manager

Jharkhand, India

Experience: Not specified

Salary: Not disclosed

Data & Applied Scientist II

Bengaluru, Karnataka

2 - 2 yrs

Salary: Not disclosed

Technical Support Engineering

Hyderabad, Telangana, India

3 - 3 yrs

Salary: Not disclosed

Product Manager II

Hyderabad, Telangana, India

Experience: Not specified

Salary: Not disclosed

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

Microsoft

Login to

Please Verify Your Phone or Email

Confirm Action

Service Engineer