Home
Jobs

Cloud Storage SRE Manager

6 - 10 years

16 - 20 Lacs

Posted:2 months ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Who you are: As a Site Reliability Engineering (SRE) Manager in Storage, you will lead a team of software and system engineers in ensuring reliability, availability, performance, security, and maintainability. You will closely work with the development and other related Release and L2 teams to design, build, and maintain robust, scalable, and highly available cloud infrastructure. You will bring a strong engineering focus to the team in preventing incidents, increasing observability, automation frameworks, self-service infrastructure, logging and metrics, and operational reports. You will be expected to drive the team to use standard tools for logging, monitoring, event management, notification, Runbook Automation, ChatOps, Root Cause Analysis. You will work with Automation Engineers and QA Engineers to ensure seamless delivery of our service offerings. Build sufficient expertise in the IBM Cloud control plane (IMS) to create automated monitoring processes Responsibilities: Keeping the service up and running or getting it back up and running quickly when failure occurs Working closely with internal partners and teams to ensure that our infrastructure meets security, SLA, and performance requirements Drive the team in writing, updating, and using documentation, including runbooks/playbooks Drive automation efforts including infrastructure needs, testing, failover solutions, failure mitigation, and much more Guide the team in debugging complex problems across an entire stack and creating solid solutions Persistent testing of application and infrastructure resiliency over a variety of error conditions. Partnering with security team and developing plans and automation to aggressively and safely respond to new risks and vulnerabilities. Develop, communicate, and monitor standard processes to promote the long-term health of sustainability and health of operational development tasks. Use metrics and analytics to determine reliability issues and remove them through automation and tooling Be an advocate for our customers, providing them self-diagnosing tools to resolve common issues that arise in the field Required education Bachelor's Degree Required technical and professional expertise 7+ yrs ofyears IT project / delivery management experience. A solid understanding of Cloud infrastructure/operations is a must Experience in leading teams and working with global teams Experience debugging complex problems Experience designing, building, and operating large-scale production systems Experience with DevOps engineering or SRE Experience with standard industry tools for monitoring and observability A strong understanding of diverse infrastructure platforms and infrastructure concepts required. Must have good experience in Infrastructure Operations automation and IT Service Management with hands on exposure in data center administration, configuration, Incident management and support Strong communication skills Preferred technical and professional experience IBM Cloud knowledge Solid understanding of best SRE and security practices Experience in Software Development Life Cycle, Test Driven Development, Continuous Integration and Continuous Delivery

Mock Interview

Practice Video Interview with JobPe AI

Start Security Compliance Interview Now

My Connections IBM

Download Chrome Extension (See your connection in the IBM )

chrome image
Download Now
IBM
IBM

Information Technology

Armonk

350,000 Employees

6362 Jobs

    Key People

  • Arvind Krishna

    Chairman and Chief Executive Officer
  • Ginni Rometty

    Former Chairman, President and CEO

RecommendedJobs for You

Chennai, Pune, Delhi, Mumbai, Bengaluru, Hyderabad, Kolkata