Home
Jobs

Posted:10 hours ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

What would SRE do here 1. Manage and maintain day-to-day BAU operations, including monitoring system performance, troubleshooting issues, and ensuring high availability. 2. Build infrastructure as code (IAC) patterns that meet security and engineering standards. 3. Build CI/CD pipelines using Octopus, GitLab-CI and cloud-native toolchains like ArgoCD. 4. Build and maintain automation scripts and tools to streamline operational processes. 5. Ensure observability around the system uptime is available and take necessary actions to triage issues with respective service teams and stakeholders. 6. Manage observability setup, including metrics and logging, and enhance capability with proficiency in PromQL queries. 7. Build runbooks that are comprehensive and detailed to manage, detect, remediate, and restore services. 8. Collaborate with engineering teams to provide quicker solutions during the firefighting and help improve the overall process. 9. Support the operations team in managing BAU by monitoring and analyzing system logs and performance metrics to identify areas for improvement and take proactive measures. 10. Stay up to date with industry trends and best practices in SRE, observability, alerting, and infrastructure automation. 11. Actively participate in rotational shift/on-call duties to ensure continuous operational support. 12. Communicate effectively with technical peers and team members in both written and verbal formats. What are we looking in new hire 1. At least 2+ years of experience as an SRE, with strong knowledge of cloud computing platforms, preferably Azure. 2. Cross-functional knowledge in Linux systems, storage, networking, security, and databases. 3. Experience in container orchestration tools like Kubernetes. 4. Proficiency in languages such as Python, Go, etc. 5. Have the capability to develop and maintain software written in any programming language. 6. Experience working with continuous integration and continuous delivery tooling and practices (e.g., GitLab, ArgoCD, Octopus). 7. Experience in monitoring infrastructure and application uptime and availability to ensure functional and performance objectives. 8. Excellent communication and collaboration skills. Show more Show less

Mock Interview

Practice Video Interview with JobPe AI

Start Reliability Interview Now

My Connections CareStack™ - Dental Practice Management

Download Chrome Extension (See your connection in the CareStack™ - Dental Practice Management )

chrome image
Download Now

RecommendedJobs for You