Job
Description
As a DevOps Engineer in Storage, you will ensure that the designed solution responds to non-functional requirements such as reliability, availability, performance, security, and maintainability. You will closely work with the development and other related Release and L2 teams.
Keep the Backend Storage environment with high availability and resiliencyA Storage Support Engineer is responsible fordiagnosing and troubleshooting technical issues related to NetApp storage hardware and softwareYou will bring a strong engineering focus to operations, putting your energy on preventing incidents, increasing observability, automation frameworks, self-service infrastructure, logging and metrics, and operational reports.You will be expected to use tools include logging, monitoring, event management, notification, Runbook Automation, ChatOps, Root Cause Analysis.You will work with Automation Engineers and QA Engineers to ensure seamless delivery of our service offerings.
Build sufficient expertise in the IBM Cloud control plane (IMS) to create automated monitoring processesResponsibilities:A Storage Support Engineer is responsible fordiagnosing and troubleshooting technical issues related to NetApp storage hardware and software, providing timely solutions to customers through phone, email, and remote sessions, acting as a primary point of contact for resolving complex technical problems, and collaborating with other teams to deliver optimal customer support, often requiring in-depth knowledge of NetApp's OnTap operating system, RAID concepts, Ethernet, FC, and iSCSI protocols, as well as familiarity with NetApp hardware like FAS and AFF arraysKeeping your assigned site or service up and running or getting it back up and running quickly when failure occursWorking closely with internal partners and teams to ensure that our infrastructure meets security, SLA, and performance requirementsWriting, updating, and using documentation, including runbooks/playbooksAutomating work including infrastructure needs, testing, failover solutions, failure mitigation, and much moreDebugging complex problems across an entire stack and creating solid solutionsDeveloping CI/CD processes to improve cadencePersistent testing of application and infrastructure resiliency over a variety of error conditions.Partnering with security engineers and developing plans and automation to aggressively and safely respond to new risks and vulnerabilities.Develop, communicate, and monitor standard processes to promote the long-term health of sustainability and health of operational development tasks.Standup and maintain pre-production and developer environments to support the entire development organization and improve overall team velocityUse metrics and analytics to determine reliability issues and remove them through automation and toolingBe an advocate forourcustomers, providing them self-diagnosing tools to resolve common issues that arise in the field
Required education Bachelor's Degree Preferred education Master's Degree Required technical and professional expertise 7+ yrs of total experience in any of the Enterprise Storage Platform like NetApp,EMC/Dell,Pure,Hitachi,HP, IBM etc.,A solid understanding of Cloud infrastructure/operations is a mustKnows their way around a Unix/Linux shell, can write shell scripts, and understands Linux internalsExperience debugging complex problemsExperience designing, building, and operating large-scale production systems
Expertise in Ansible, Bash, core Python developmentStrong familiarity with one of C, C++, golang, python, or JavaExperience with DevOps engineering or SREExperience with standard industry tools for monitoring and observabilityExperience automating infrastructure, configuration management, testing, and deployments using tools like Ansible, Chef and can explain the Infrastructure as Code paradigmA strong understanding of diverse infrastructure platforms and infrastructure concepts required.Has hands-on experience using source control and feature branching strategiesUnderstands networking and messaging, especially between services
Must have good experience in Infrastructure Operations automation and IT Service Management with hands on exposure in data center administration, configuration, Incident management and supportStrong communication skills Preferred technical and professional experience -IBM Cloud API knowledge- Any Storage Product(NetApp,EMC/DELL etc.,) hands-on work experience-Behavior Driven Development-Experience in Software Development Life Cycle, Test Driven Development, Continuous Integration and Continuous Delivery-Familiarity with cloud deployment tooling such as razee and launch darkly.