Home
Jobs

Service Reliability Infra Specialist

5 - 10 years

12 - 16 Lacs

Posted:Just now| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Job Overview:


We are seeking a skilled and experienced Service Reliability Specialist to join our diverse team as part of newly created Service Reliability Centre (SRC). In this role, you will help improve the availability and performance of Arm infrastructure by utilising Arms AI Operations (AIOPS) and observability platforms. You will collaborate closely with development and platform teams to build and maintain robust observability and response processes.

Responsibilities:


  • Serve as the primary technical contact during critical incidents for both on-premise and cloud infrastructure.
  • Lead Root Cause Analysis (RCA) for major incidents, identifying contributing factors and actionable remediation.
  • Utilize Dynatrace and ServiceNow for correlation analysis, system tracing, and optimizing alerts and visibility.
  • Perform detailed diagnostics for virtualization, storage, operating systems, and cloud services during incidents.
  • Develop clear and comprehensive runbooks, diagnostic guides, and incident documentation.
  • Collaborate post-incident with platform teams to implement improvements via automation, tuning, or design enhancements.
  • Coordinate improvements in monitoring, event correlation, and response processes with platform and tooling teams.
  • Automate routine diagnostic tasks using scripting (Ansible, Python).
  • Provide technical expertise during service onboarding, including setting alert rules, thresholds, and RCA guidelines.

Required Skills and Experience:


  • 5+ years in Infrastructure Operations or Platform Support.
  • Skilled in detailed root cause analysis and impact assessments in complex environments (cloud-native and legacy).
  • Expertise with observability tools (Dynatrace, Datadog, Splunk).
  • Proficient in managing Linux/Windows servers, virtualization, storage, and identity platforms (LDAP, Azure AD).
  • Strong scripting skills (Python, PowerShell, Bash) and infrastructure automation experience using Ansible.
  • Familiarity with ITSM processes and incident management using ServiceNow.
  • Comfortable with independent work and flexible shift schedules (including off-hours/weekends) as part of a global team.
  • Excellent documentation and communication skills to translate technical issues into actionable insights.
  • Capable of analyzing incident trends and recommending reliability improvements.
  • Knowledge of virtualization, storage infrastructure, high-performance computing, and cloud services.
  • Experience with User Access Management (UAM) and Identity Access Management (IAM) on-premise (OUD LDAP) and Azure AD.
  • Experience maintaining Windows and Linux operating systems.
  • Proficient with engineering tools (GitHub, Jira, Confluence).

Nice To Have Skills and Experience:


  • Exposure to

    high performance computing

    or cloud-native services
  • Knowledge to CI/CD tooling (e.g., Jenkins, GitLab) or container-based systems
  • Experience defining

    SLIs, SLOs

    , and building service health dashboards

In Return:



Accommodations at Arm
At Arm, we want to build extraordinary teams. . To note, by sending us the requested information, you consent to its use by Arm to arrange for appropriate accommodations. All accommodation or adjustment requests will be treated with confidentiality, and information concerning these requests will only be disclosed as necessary to provide the accommodation. Although this is not an exhaustive list, examples of support include breaks between interviews, having documents read aloud, or office accessibility. Please email us about anything we can do to accommodate you during the recruitment process.

Equal Opportunities at Arm

Hybrid Working at Arm
#LI-LK2

Accommodations at Arm


At Arm, we want to build extraordinary teams. . To note, by sending us the requested information, you consent to its use by Arm to arrange for appropriate accommodations. All accommodation or adjustment requests will be treated with confidentiality, and information concerning these requests will only be disclosed as necessary to provide the accommodation. Although this is not an exhaustive list, examples of support include breaks between interviews, having documents read aloud, or office accessibility. Please email us about anything we can do to accommodate you during the recruitment process.


Equal Opportunities at Arm


Mock Interview

Practice Video Interview with JobPe AI

Start Automation Interview Now
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
ARM Embedded Technologies
ARM Embedded Technologies

Technology / Embedded Systems

San Jose

50-200 Employees

26 Jobs

    Key People

  • Jane Doe

    CEO
  • John Smith

    CTO

RecommendedJobs for You