Assignment DescriptionThe role focuses on
observability, release, and operations
(resilience is a plus). Candidates must provide examples of implementation around these key areas.
Job Summary
The Site Reliability Engineer will be responsible for developing, testing, and maintaining high-quality software solutions while ensuring stability and reliability across systems. Proficiency in programming languages and cloud service providers is essential, alongside solid problem-solving and teamwork skills.Main Responsibilities
- Develop, test, and maintain high-quality software solutions, frameworks, and automations.
- Collaborate with cross-functional teams to analyze requirements and design solutions around stability and reliability.
- Participate in code reviews to ensure code quality and shared knowledge.
- Identify, troubleshoot, and resolve various incidents and problems, ensuring adherence to DevOps/SRE best practices.
- Contribute to continuous improvement initiatives within the engineering team.
Key Requirements
- Proficiency in one or more programming/scripting languages, such as Python.
- Solid understanding of Agile development methodologies.
- Willingness to work with operations and incident/problem management.
- Good knowledge of at least one major cloud service provider: Microsoft Azure or GCP.
- Experience in building CI/CD workflows using GitHub Actions.
- Experience in Observability setup (Application, Infrastructure) using tools such as Splunk, Grafana, etc.
- Familiarity with version control systems such as Git.
- Good problem-solving skills and eagerness to learn.
- Excellent communication and teamwork skills.
Nice to Have
- Understanding of security assessments and compliance standards.
Other Details
This role may involve working remotely and participating in an on-call rotation for incident management. The focus is on continuous improvement and collaboration within a dynamic engineering team.