Posted:2 months ago|
Platform:
Work from Office
Full Time
Who you are: As a Site Reliability Engineering (SRE) Manager in Storage, you will lead a team of software and system engineers in ensuring reliability, availability, performance, security, and maintainability. You will closely work with the development and other related Release and L2 teams to design, build, and maintain robust, scalable, and highly available cloud infrastructure. You will bring a strong engineering focus to the team in preventing incidents, increasing observability, automation frameworks, self-service infrastructure, logging and metrics, and operational reports. You will be expected to drive the team to use standard tools for logging, monitoring, event management, notification, Runbook Automation, ChatOps, Root Cause Analysis. You will work with Automation Engineers and QA Engineers to ensure seamless delivery of our service offerings. Build sufficient expertise in the IBM Cloud control plane (IMS) to create automated monitoring processes Responsibilities: Keeping the service up and running or getting it back up and running quickly when failure occurs Working closely with internal partners and teams to ensure that our infrastructure meets security, SLA, and performance requirements Drive the team in writing, updating, and using documentation, including runbooks/playbooks Drive automation efforts including infrastructure needs, testing, failover solutions, failure mitigation, and much more Guide the team in debugging complex problems across an entire stack and creating solid solutions Persistent testing of application and infrastructure resiliency over a variety of error conditions. Partnering with security team and developing plans and automation to aggressively and safely respond to new risks and vulnerabilities. Develop, communicate, and monitor standard processes to promote the long-term health of sustainability and health of operational development tasks. Use metrics and analytics to determine reliability issues and remove them through automation and tooling Be an advocate for our customers, providing them self-diagnosing tools to resolve common issues that arise in the field Required education Bachelor's Degree Required technical and professional expertise 7+ yrs ofyears IT project / delivery management experience. A solid understanding of Cloud infrastructure/operations is a must Experience in leading teams and working with global teams Experience debugging complex problems Experience designing, building, and operating large-scale production systems Experience with DevOps engineering or SRE Experience with standard industry tools for monitoring and observability A strong understanding of diverse infrastructure platforms and infrastructure concepts required. Must have good experience in Infrastructure Operations automation and IT Service Management with hands on exposure in data center administration, configuration, Incident management and support Strong communication skills Preferred technical and professional experience IBM Cloud knowledge Solid understanding of best SRE and security practices Experience in Software Development Life Cycle, Test Driven Development, Continuous Integration and Continuous Delivery
IBM
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
My Connections IBM
Bengaluru
16.0 - 20.0 Lacs P.A.
Chennai, Pune, Delhi, Mumbai, Bengaluru, Hyderabad, Kolkata
30.0 - 40.0 Lacs P.A.
Bengaluru
20.0 - 25.0 Lacs P.A.
Chennai
8.0 - 13.0 Lacs P.A.
Bengaluru
15.0 - 20.0 Lacs P.A.
40.0 - 50.0 Lacs P.A.
Chennai
35.0 - 40.0 Lacs P.A.
14.0 - 18.0 Lacs P.A.
22.5 - 27.5 Lacs P.A.
40.0 - 50.0 Lacs P.A.