Jobs
Interviews

1 Thanos Grafana Jobs

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

6.0 - 9.0 years

15 - 30 Lacs

bengaluru

Hybrid

Technical Expertise And Experience Deep understanding of SRE concepts, including SLIs, SLOs, SLAs, error budgets, and reliability engineering best practices. Expertise in observability tools such as Prometheus, Thanos, and Grafana, CloudWatch is mandatory. Strong hands-on experience with any of the monitoring tools with a proven ability to set up and manage monitoring and alerting systems. Proficiency in cloud platforms ( AWS is mandatory). Strong scripting and automation skills, with proficiency in Python and Bash. Hands-on experience with infrastructure operations and observability. Extensive knowledge and hands-on experience across IT infrastructure, cloud platforms, and networking. Significant experience with Kubernetes, including running, managing, and troubleshooting containerized workloads. Experience working with version control systems like GitHub and implementing CI/CD pipelines is a plus. Experience with infrastructure-as-code (IaC) tools like Terraform or ARM templates is a plus. SRE Expertise Ability to define and implement SRE best practices for data platforms, data driven applications, ensuring alignment with organizational goals. Provide mentorship and guidance to teams in adopting SRE principles and improving operational excellence. Collaborate with cross-functional teams to drive reliability, scalability, and performance across data engineering, data science, and platform engineering projects. Monitor System performance and implement solutions to improve stability and efficiency. Automate repetitive operational tasks using infrastructure-as-code and configuration management tools. Create and maintain CI/CD pipelines for automated testing and deployment. Participate in on-call rotations, respond to incidents and lead post- mortems to drive continuous improvement. Soft Skills Strong planning and organizational skills to manage individual and team responsibilities efficiently. Excellent problem-solving and troubleshooting skills, with the ability to analyze complex issues and implement effective solutions. Effective real-time communication, ensuring clear and concise updates for both technical and non-technical stakeholders. Ability to work under pressure and manage incidents effectively, ensuring timely resolutions and minimal downtime. Collaborative mindset with the ability to foster a culture of ownership, accountability, and continuous improvement.

Posted Date not available

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies