Cloud Reliability Engineer
This role has been designed as Hybrid with an expectation that you will work on average 2 days per week from an HPE office.
What youll do:
- Work on tools and technologies that involve monitoring, automating, and improving systems through software engineering principles applied to IT operations.
- Harness the power of data and metrics to make evidence-based improvements that enhance the way we operate COM.
- Build and maintain comprehensive metrics collection systems
- Collaborate and partner with feature development partner teams on best practices to ensure we have global team visibility of our applications health, SLIs and SLOs
- Use data to gain insight into the COM stack for the purpose of improving performance, reliability, and cost effectiveness
- Build out robust documentation and runbook standards that our teams use to improve our incident response effectiveness. This includes measuring the effectiveness of our documentation and building analytics and other tooling to ensure our teams are producing relevant and useful content to help our oncall teams
- Implement and maintain security controls and practices to protect systems from unauthorized access and attacks.
What you need to :
- Bachelors or Masters degree in Computer Science, Information Systems, or equivalent
- Typically 2-8 years experience
- Development experience with Python, Go or Java (or C#, C++, C) or similar programming languages
- Good understanding of REST APIs and the fundamentals of successful design and testing of a REST API
- Have an enthusiastic, go-for-it attitude. When you see something broken, you can t help but fix it
- Have an urge to collaborate and communicate asynchronously
- Good understanding of distributed systems, event driven programming paradigms and designing for scale and performance
- Ability to troubleshoot complex issues with curiosity, flexibility, creativity and a sense of ownership and accountability
- Strong communication skills and ability to work in a distributed team
- Highly desirable one or more of: Grafana, Prometheus, AWS, Kubernetes, Terraform
As a part of our team you will be able to learn or apply existing knowledge in the following areas:
- Your contributions will have visible and technical impact on how our service teams operate and how we serve our customers
- Providing informed technology inputs or proposals to management that consistently improve the value to the business
- Demonstrating breadth of influence across multiple organizations within the business
- Evolving existing technologies or introducing new technologies that improve the value of products and services to our customer
- Your work product demonstrates technical creativity and innovation:
- Developing leading-edge technologies and new solutions that provide positive change in how customers perceive our products and services
- Providing technology contributions to improve time to market and/or overall quality and customer experience
- Your contributions will consider the broader context:
- Working in a broad context to include: partner groups, other organizations, strategic program objectives, customer value chain, etc.
- Producing technical results that balance a mixture of characteristics including: robustness, reusability, cost sensitivity, and customer appropriateness
Cloud Architectures, Cross Domain Knowledge, Design Thinking, Development Fundamentals, DevOps, Distributed Computing, Microservices Fluency, Full Stack Development, Release Management, Security-First Mindset, User Experience (UX)
#india #compute
Job:
Engineering
Job Level:
TCP_02