3 - 7 years

5 - 9 Lacs

Posted:1 week ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Job_Description":"
1. Act as a subject matter expert (SME) for all the monitoring tool and systems.
2. Predict major incidents and outages.
3. Define automation strategies to reduce manual intervention.
4. Establish best practices for system reliability, monitoring, and performance tuning.
5. Collaborate with CloudOps,DBA,Business,and Development teams reduce and improve system resilience.
6. Conduct important stakeholder meetings eg : with business team and senior management.
7. Major contribution in escalating Impact 1 calls and ensuring SLAs are maintained.
8. Shadow your team members so that if they are making some mistakes they can be caught a real quick by spot check.
Requirements
Team player - Technical mentorship & guidance to team
Communication - Engages with different teams for better communication and relations.
Incident Handling - Proactively Escalates & resolves critical technical issues
Process Compliance - Implements automation, process optimization & maintain data security.
Technical Depth - Deep expertise in system, monitoring tools & troubleshooting.
Spot Checking Team members - Need to Spot check team members if they are making some mistakes.
Preparing Splunk Dashboards Ad-hoc reports

Incident Escalation Efficiency: Ensure resolution for any Impact 1/2 incidents within defined SLAs.
Problem Management: Conduct thorough RCA for recurring incidents and implement permanent fixes to reduce issue recurrence.
Automation: Work on automation initiatives to reduce manual efforts.
Monitoring & Alerting Enhancements: Improve monitoring tools and alert mechanisms to enable proactive issue detection and prevention.
Technical Mentorship: Train and mentor junior engineers to enhance overall technical expertise within the team.
Stakeholder & Cross-Team Collaboration: Act as a bridge between DEV, Ops team, and Business teams for effective communication for any escalation.
Compliance & Security Adherence: Ensure to provide secure data with compliance regulations and security standards to end users.
Learning & Innovation: Research and implement emerging technologies and best practices to improve monitoring support efficiency.
Benefits
  • 5 Days Working
  • One Complimentary Meal per Day
  • Internet Reimbursement
  • Gym Reimbursement
  • Group Medical Insurance
  • Mental Health support benefits
  • Relocation Assistance (if Applicable)

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You