Required 3-5 years of experience in Azure
You will work with
A multi-disciplinary squad, engaging enterprise platform teams, data platform teams, vendors, third party resources in resilient and optimal operations of one or more business critical platform.
Let me tell you about the role
As a site reliability engineers, we will be responsible for building, maintaining and operating the software solutions, infrastructure and services that powers technology platforms. In this role, we work with a team of engineers and team members to ensure that the digital solutions are highly available, scalable, and secure and will be responsible for automating routine tasks, improving the solution's performance, and providing technical support to other teams.
What you will deliver
- Ensure the reliability, performance, and scalability of large-scale, cloud-based applications and infrastructure.
- Creating automated solutions to improve operational aspects of the site.
- Ensure that applications and websites run smoothly and efficiently.
- Detect issues and automatically managing failures to keep systems up and running.
- Work with software developers, engineers, and operations teams to improve system performance.
- Analyse incidents to prevent future disruptions.
What you will need to be successful (experience and qualifications)
Technical skills
- A bachelor's degree in computer science, engineering, or a related field or equivalent work experience.
- Relevant certifications (e.g., Azure cloud engineering, fundamentals, DevOps, architect certifications) can be helpful. Knowledge of networking concepts, protocols, and tools, willingness to learn new technologies and adapt to changing environments.
- Skilled in managing configuration, deployments, observability, handling and resolving incidents, including root cause analysis, managing and operating complex systems for scalability, availability and performance.
- Proficient in communication and collaboration skills to work effectively with development and operations teams.
Software skills
- Skilled in languages like Python, Go, Java, or Ruby, and scripting skills in Bash or PowerShell.
- Skilled in software engineering practices for full SDLC, including coding standards, code reviews, source control management, continuous deployments (e.g., Jenkins, GitLab CI, or CircleCI), testing, and operations.
- Skilled in building complex software systems end-to-end which have been optimally delivered and operated in production, should understand security and privacy standard methodologies as well as how to properly monitor, log, and alarm production systems.
Infrastructure skills
- Skilled knowledge of Linux/Unix systems, including system configuration, networking, and debugging.
- Expert in building and scaling infrastructure services using Microsoft Azure
- Skilled with infrastructure tools like Ansible, Puppet, Chef, or Terraform for infrastructure as code, monitoring tools (e.g., Prometheus, Grafana) and logging systems (e.g., ELK stack).
- Skilled in the understanding of using core cloud application infrastructure services including identity platforms, networking, storage, databases, containers, and serverless
- Skillful knowledge of databases, such as relational, graph, document, and key-value, including performance tuning and improvement