Introduction
Introduction
At IBM, work is more than a job - it's a calling: To build. To design. To code. To consult. To think along with clients and sell. To make markets. To invent. To collaborate. Not just to do something better, but to attempt things you've never thought possible. Are you ready to lead in this new era of technology and solve some of the world's most challenging problems If so, lets talk.
About Business Unit
IBM Systems helps IT leaders think differently about their infrastructure. IBM servers and storage are no longer inanimate - they can understand, reason, and learn so our clients can innovate while avoiding IT issues. Our systems power the world's most important industries and our clients are the architects of the future. Join us to help build our leading-edge technology portfolio designed for cognitive business and optimized for cloud computing.
Your Role And Responsibilities
As an Entry-Level Site Reliability Engineer (SRE) at OneIT Lab Engineering Team,you will join a team dedicated to ensuring the reliability, scalability, and performance of IBM systemsand infrastructure. This plays a critical role in advancing critical IBM Power System developmentinitiatives, gaining hands-on experience with both physical hardware and software environments.
Your Responsibilities Will Include, But Not Limited To
- Assisting in the setup, configuration, and maintenance of IBM Power servers and related infrastructure.
- Supporting software-related reliability initiatives such as automation, monitoring, performance tuning, and system optimization.
- Participating in incident response, diagnostics, and root-cause analysis for both hardware and software issues.
- Collaborating with cross-functional teams to ensure smooth integration between physical systems and application environments.
- Supporting projects related to lab analyticsgathering, analyzing, and interpreting data to help guide better business and operational decisions.
- Contributing to the deployment, scaling, and ongoing maintenance of production and test systems.
- Writing clear, concise documentation for processes, configurations, and troubleshooting steps.
- Learning and applying best practices in systems reliability, observability, and infrastructure operations.
You will be expected to grow into a well-rounded SRE capable of tackling challenges in both the physical data center like environment and the software layer that powers our services. Mentorship and hands-on training will be provided to help you develop the skills to excel in both domains.
Preferred Education
Bachelor's Degree
Required Technical And Professional Expertise
- 2+ years of working experience as SRE engineer
- Passion for eliminating repetitive manual processes using automation.
- Strong attention to detail and excellent analytical capabilities.
- Excellent troubleshooting, problem solving, and debugging skills.
- Proficiency in programming concepts and frameworks.
- Proficiency in scripting/coding for automation using Python, shell scripting (bash, etc), Ansible, and related tools and languages.
- Familiarity with server operations, virtualization, and related infrastructure concepts.
- Fundamental understanding of computer networks.
- Fundamental understanding of data science/analytics framework.
- An automation mindset, wherever possible, you should use scripting and automation.
- Ability to work independently and as part of a team to achieve the SRE agenda
- Complete project work, both supervised and unsupervised
- Ability to effectively prioritize and execute tasks in high-pressure environment.
- Good Written, oral, and interpersonal communication skills.
Preferred Technical And Professional Experience
- Fundamental understanding of Linux/Unix systems is a plus.
- Fundamental knowledge of Red Hat OpenShift and Kubernetes is a plus
- Automation/Scripting: In-depth experience with Ansible, Python, Terraform, and CI/CD tools is a plus, but a fundamental understanding is a must.
- Hands-on experience crafting alerts and dashboards using Python or any other language.