Jobs

Interviews
Job Alerts
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Home
>
Jobs in Gurugram
>
Knowmax
>
Site Reliability Engineer

Site Reliability Engineer

Knowmax

1 - 4 years

3 - 6 Lacs

Gurugram

Posted:7 months ago| Platform:

Apply

Skills Required

Root cause analysis Change management Linux Networking Problem management Incident management Troubleshooting Continuous improvement Operations Performance improvement

Work Mode

Work from Office

Job Type

Full Time

Job Description

Job Title: Site Reliability Engineer 1 (SRE 1) Overview: As a Site Reliability Engineer 1 (SRE 1) within the DevOps team, you will be instrumental in maintaining and enhancing the reliability, performance, and scalability of our infrastructure. Your role will involve proactive monitoring, incident response, and continuous improvement of our systems. You will collaborate with clients for deployment support and ensure adherence to best practices in site reliability engineering. Responsibilities: 1. Infrastructure Monitoring and Support: Continuously monitor infrastructure using tools such as Prometheus, Alertmanager, and Grafana to ensure optimal performance and reliability. Implement and maintain monitoring solutions to detect and address issues proactively. 2. Client Interaction and Deployment Support: Collaborate with clients to provide deployment support, ensuring seamless implementation and operational functionality. Assist in on-prem and cloud-based deployments as needed. 3. Troubleshooting and Incident Management: Utilize expertise in Linux, Docker, and networking to diagnose and resolve technical issues efficiently. Manage incidents from detection to resolution, ensuring minimal impact on service availability. 4. Problem Management and Root Cause Analysis: Conduct root cause analysis for recurring issues and implement solutions to prevent future occurrences. Identify and mitigate potential problems to enhance system stability. 5. 24/7 Shift Support: Participate in a rotating shift schedule to provide 24/7 support, ensuring continuous monitoring and rapid response to incidents. Maintain high availability and reliability of services during off-hours. 6. Maintenance and Change Management: Execute maintenance tasks and implement change requests as per Standard Operating Procedures (SOPs). Ensure adherence to change management protocols to maintain system integrity. 7. Documentation and Reporting: Maintain comprehensive documentation of incidents, changes, and maintenance activities. Generate and analyze reports to identify areas for performance improvement and optimization. Requirements: Technical Proficiency: Strong understanding of IT infrastructure with expertise in monitoring tools like Prometheus, Alertmanager, and Grafana. Proficiency in Linux, Docker, and networking is essential for troubleshooting and incident management. Client Communication: Excellent interpersonal skills for effective client interaction and support. Incident and Problem Management: Experience in managing incidents, conducting root cause analysis, and implementing problem management strategies. Adaptability and Shift Work: Ability to work in rotating shifts for 24/7 support and adapt to dynamic operational needs. SOP Adherence: Ability to follow defined SOPs for maintenance tasks and change implementations.

More Jobs at Knowmax

Site Reliability Engineer

Gurugram

1 - 4 yrs

INR 3 - 6 Lacs

SDE1

Gurugram

Experience: Not specified

INR 3 - 6 Lacs

Technical Content Writer

Gurugram, Haryana, India

Experience: Not specified

Salary: Not disclosed

Technical Content Writer

Gurugram

1.0 - 4.0 yrs

Salary: Not disclosed

Technical Content Writing Inter

Amritsar, Punjab, India

Experience: Not specified

Salary: Not disclosed

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

Knowmax

Login to

Please Verify Your Phone or Email

Confirm Action

Site Reliability Engineer