Grafana Stack Consultant

5 - 9 years

0 Lacs

Posted:1 day ago| Platform: Shine logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

As a highly skilled Grafana Stack Consultant, your role will involve managing and scaling self-hosted observability platforms. You will be responsible for conducting in-depth performance analysis of the Grafana observability stack, particularly Loki, Tempo, and Mimir. Your key responsibilities will include: - Identifying and remediating performance bottlenecks in log ingestion, indexing, and query execution - Optimizing system configurations for enhanced throughput, latency, and fault tolerance - Troubleshooting and resolving instability and downtime in the observability pipeline - Recommending and implementing best practices for operating the Grafana stack in Docker Swarm environments - Collaborating with internal engineering teams to guide architecture improvements and ensure observability stack aligns with reliability goals Qualifications Required: - 5 years of hands-on experience with self-hosted Grafana, Loki, Tempo, and Mimir - Strong understanding of observability architectures, including metrics, logs, and tracing - Proven track record of operating these components at scale, especially in high-ingestion environments (e.g., 500GB+/day) - Proficiency in container orchestration, especially Docker Swarm; familiarity with Linux VM-based deployments - Experience integrating and tuning object storage backends such as Wasabi S3 or other S3-compatible services - Ability to diagnose complex issues across distributed systems and propose effective, sustainable solutions - Excellent problem-solving skills and ability to work independently in a remote Additional Company Details: The existing observability infrastructure comprises Grafana, Loki, Tempo, and Mimir, operating in a Docker Swarm environment across multiple Linux VMs. The system ingests approximately 500GB of logs per day and uses Wasabi S3 as the object storage backend. The platform has been facing performance degradation, including slow search queries, delays in log ingestion, and periodic outages affecting system observability. (Note: Experience migrating from Docker Swarm to Kubernetes (K8s), familiarity with CI/CD practices, and infrastructure automation are considered nice-to-have skills for this role.) As a highly skilled Grafana Stack Consultant, your role will involve managing and scaling self-hosted observability platforms. You will be responsible for conducting in-depth performance analysis of the Grafana observability stack, particularly Loki, Tempo, and Mimir. Your key responsibilities will include: - Identifying and remediating performance bottlenecks in log ingestion, indexing, and query execution - Optimizing system configurations for enhanced throughput, latency, and fault tolerance - Troubleshooting and resolving instability and downtime in the observability pipeline - Recommending and implementing best practices for operating the Grafana stack in Docker Swarm environments - Collaborating with internal engineering teams to guide architecture improvements and ensure observability stack aligns with reliability goals Qualifications Required: - 5 years of hands-on experience with self-hosted Grafana, Loki, Tempo, and Mimir - Strong understanding of observability architectures, including metrics, logs, and tracing - Proven track record of operating these components at scale, especially in high-ingestion environments (e.g., 500GB+/day) - Proficiency in container orchestration, especially Docker Swarm; familiarity with Linux VM-based deployments - Experience integrating and tuning object storage backends such as Wasabi S3 or other S3-compatible services - Ability to diagnose complex issues across distributed systems and propose effective, sustainable solutions - Excellent problem-solving skills and ability to work independently in a remote Additional Company Details: The existing observability infrastructure comprises Grafana, Loki, Tempo, and Mimir, operating in a Docker Swarm environment across multiple Linux VMs. The system ingests approximately 500GB of logs per day and uses Wasabi S3 as the object storage backend. The platform has been facing performance degradation, including slow search queries, delays in log ingestion, and periodic outages affecting system observability. (Note: Experience migrating from Docker Swarm to Kubernetes (K8s), familiarity with CI/CD practices, and infrastructure automation are considered nice-to-have skills for this role.)

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
Radioactive Technologies logo
Radioactive Technologies

Technology / Environmental Services

Washington

RecommendedJobs for You