Home
Jobs

NOC Manager -Infrastructure Support

5 - 10 years

8 - 15 Lacs

Posted:21 hours ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Role & responsibilities Team Leadership and Management: Oversee NOC Operations: Supervise daily NOC operations to ensure continuous monitoring, incident management, and timely escalation of issues. Manage NOC Staff: Lead a team of NOC engineers and analysts, handle shift schedules to ensure 24/7/365 coverage, and provide guidance on troubleshooting and escalation processes. Training and Development: Ensure team members are up to date with the latest technology and operational standards, providing training, mentorship, and development opportunities. Performance Management: Conduct regular performance evaluations, set KPIs, and implement improvement plans as needed to enhance team performance. Incident Management and Response: Incident Oversight: Take charge of major incidents, ensuring rapid assessment, resource allocation, and timely resolution of critical issues. Escalation and Coordination: Coordinate between NOC, support teams, and vendors to resolve incidents, escalating as required for effective resolution. Root Cause Analysis (RCA): Lead RCA for significant incidents, collaborating with cross-functional teams to identify underlying issues and implement preventive measures. Incident Reporting: Develop and maintain incident documentation, reporting to upper management on significant incidents, resolution timelines, and preventive actions taken. Monitoring and Infrastructure Optimization: System Monitoring: Ensure comprehensive 24/7 monitoring of network, servers, applications, and data center infrastructure for potential issues or anomalies. Optimization of Tools: Evaluate and implement advanced monitoring tools, improving visibility, alerting capabilities, and data-driven insights into infrastructure performance. Performance Metrics: Develop and refine performance metrics, dashboards, and reports to monitor infrastructure health, availability, and response times. Change Management and Compliance: Change Control: Oversee infrastructure changes to minimize service disruptions, participating in change advisory boards (CABs) and ensuring adherence to change management policies. Risk Management: Evaluate risks associated with planned changes and ensure risk mitigation strategies are in place. Compliance and Standards: Ensure NOC operations meet regulatory compliance standards (e.g., ISO, SOC, GDPR) and follow industry best practices for security, privacy, and data protection. Automation and Process Improvement: Process Optimization: Analyze and streamline NOC processes for incident handling, escalation, reporting, and documentation, focusing on efficiency and consistency. Automation: Identify repetitive tasks and opportunities for automation (e.g., alert triage, routine system checks) to reduce manual workload and improve response times. SOP Development: Create and maintain standard operating procedures (SOPs) for common incident scenarios, ensuring consistent handling and faster resolution. Capacity and Performance Planning: Infrastructure Scaling: Collaborate with IT and engineering teams to anticipate capacity needs based on growth trends, scaling resources proactively to avoid downtime. Performance Reviews: Regularly assess system performance, identifying bottlenecks or risks to service reliability, and coordinate with infrastructure teams to address issues. Budget Management: Oversee NOC-related budget, ensuring optimal use of resources and advocating for necessary investments in monitoring tools and staffing. Vendor and Stakeholder Management: Vendor Liaison: Manage relationships with vendors and service providers, ensuring service level agreements (SLAs) are met and coordinating support during critical incidents. Stakeholder Communication: Act as a point of contact for internal stakeholders regarding NOC operations, infrastructure status, and ongoing incidents. SLA and KPI Management: Define, track, and report SLAs and KPIs for NOC performance, continuously working with vendors and internal teams to meet or exceed targets. Reporting and Analytics: Operational Reporting: Generate daily, weekly, and monthly reports on NOC performance, including metrics like uptime, incident resolution times, and resource utilization. Trend Analysis: Identify recurring incidents or performance trends, using data to improve incident management processes and proactively address potential issues. Executive Summaries: Provide high-level summaries and performance reports to executive leadership, highlighting key metrics, incidents, and NOC achievements Preferred candidate profile Technical Expertise: Deep understanding of networking, server management, data centers, cloud infrastructure, and monitoring tools (e.g., SolarWinds, Nagios, Datadog). Leadership and People Management: Strong leadership abilities, with experience in managing cross-functional teams and fostering a high-performance, collaborative environment. Incident Management: Proficiency in ITIL-based incident and problem management, especially with high-stakes or complex incidents. Analytical Skills: Ability to analyze large datasets for patterns and insights, using data to improve operations and prevent outages. Change and Compliance Management: Knowledge of change management frameworks, regulatory compliance standards, and risk assessment. Communication: Excellent communication skills to effectively liaise between technical teams, stakeholders, and upper management. Budgeting and Resource Allocation: Familiarity with budget management and resource planning for NOC operations. Promote a Proactive Approach : Focus on preventive maintenance, capacity planning, and performance monitoring to anticipate and mitigate issues before they escalate. Foster a Continuous Improvement Culture : Regularly review NOC processes and metrics, encouraging feedback from team members to drive improvements. Focus on Team Training : Ensure the team is well-trained in both technical skills and soft skills, such as communication and time management. Certification: Certifications in cloud technologies, networking, or DevOps are advantageous. Shift Flexibility: Willingness to work in shifts, including weekends and off-hours if necessary. Ensure Clear Communication Channels : Establish clear communication protocols for incidents, ensuring stakeholders are promptly updated during outages or major events. Problem-Solving: Exceptional analytical skills for diagnosing and resolving complex IT issues.

Mock Interview

Practice Video Interview with JobPe AI

Start Technical Expertise Interview Now
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
Zybisys Consulting Services
Zybisys Consulting Services

Consulting Technology

Tech City

50 Employees

4 Jobs

    Key People

  • John Smith

    CEO
  • Jane Doe

    CTO

RecommendedJobs for You