Job
Description
As a Monitoring and Observability Engineer (Prometheus & Grafana Specialist) at Infobell IT, you will be responsible for designing, implementing, and optimizing robust monitoring solutions using Prometheus and Grafana for critical infrastructure and applications. **Key Responsibilities:** - Designing, implementing, and maintaining comprehensive monitoring and alerting solutions using Prometheus and Grafana. - Developing and optimizing complex PromQL queries for efficient data retrieval and analysis from Prometheus. - Customizing Grafana dashboards extensively to visualize large and complex datasets, ensuring optimal performance, readability, and actionable insights. - Designing, developing, and maintaining custom Grafana plugins using JavaScript, TypeScript, React, and Go to extend Grafana's core functionalities. - Integrating Prometheus and Grafana with various data sources, including databases, cloud services, APIs, and log aggregation tools. - Configuring and managing Alertmanager for effective alert routing, notification, and escalation. - Troubleshooting and resolving issues related to Prometheus and Grafana instances, data collection, query performance, and dashboard rendering. - Collaborating closely with SRE, DevOps, and development teams to understand monitoring requirements and translate them into effective observability solutions. - Implementing and advocating for best practices in monitoring, alerting, and observability. - Optimizing Prometheus storage and data retention policies for scalability and cost-effectiveness. - Automating the provisioning and configuration of Prometheus and Grafana using Infrastructure as Code (IaC) tools. - Staying up-to-date with the latest trends and advancements in the Prometheus and Grafana ecosystems. **Qualifications Required:** - Bachelor's degree in Computer Science, Information Technology, or a related field. - 2+ years of hands-on experience in implementing and managing Prometheus and Grafana in production environments. - Advanced proficiency in Grafana dashboard customization and developing custom Grafana plugins. - Strong expertise in PromQL for querying, aggregating, and analyzing time-series data. - Solid understanding of monitoring best practices, alerting strategies, and observability principles. - Experience with containerization technologies, cloud platforms, and scripting languages. - Strong problem-solving, analytical, and debugging skills. - Excellent communication and collaboration skills to work effectively with cross-functional teams. (Note: No additional details about the company were provided in the job description.),