Site Reliability Engineer, Incident Management

1 - 5 years

0 Lacs

Posted:5 days ago| Platform: Shine logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

As a Site Reliability Engineer - Incident Management, you will be responsible for monitoring, maintaining, and managing the entire Qualys infrastructure and services installed at different data centers. In the event of any malfunction in products/services, you will be required to monitor, troubleshoot, repair, and restore the service/system promptly to ensure maximum service availability and performance. Your role will also involve providing support services for Engineering and other technical teams, collaborating for quicker issue resolution, performing end-to-end incident management, documentation, and task automation. Your main responsibilities will include monitoring the performance and capacity of computer systems, utilizing various tools to identify and address issues effectively. You will be expected to conduct basic troubleshooting of platform/product issues, utilize tools such as Splunk, Grafana, Kibana for performance checking, and manage PagerDuty. Additionally, you will assist in task automation wherever applicable, ensure timely resolution of incident tickets, and work on triaging and troubleshooting problems affecting products or services. It will be crucial for you to meticulously track and document all issues and resolutions in detail on the ticketing/documentation tools to enhance the knowledge base and maintain a record of system health. In cases where troubleshooting complex issues is not feasible, you should escalate the problem to management, IT resources, or 3rd party vendors for further assistance. Communication within the team and externally to stakeholders, keeping them informed of relevant information, known issues, and steps being taken, will be an integral part of your role. The Site Reliability Engineer - Incident Management team will operate 24*7*365 on a monthly shift rotation basis as per requirements. To excel in this role, you should possess one to two years of IT Operations (Infra/System admin/Linux) experience or relevant certification. Familiarity with monitoring and integration tools like Splunk, Prometheus, Grafana, Kibana, PagerDuty, Runscope, and incident management tools such as Jira/ServiceNow is beneficial. A good understanding of ITSM main functions and tools, along with strong interpersonal skills to interact with employees at all levels professionally, will be essential. Certifications in computer functionality, Linux, System Admin, VMware, IT Security, or ITSM/ITIL, and knowledge of DevOps/SRE basics, Python, and Cloud will be advantageous for this role.,

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
Qualys logo
Qualys

Computer and Network Security

Foster City CA

RecommendedJobs for You