IT Operations Infrastructure Consultant [Prometheus & Grafana] - EG CloudOps At EG, we develop software for our customers so they can focus on their profession. Our industry-specific software is built by peers from the industry, and backed by the scale of EG for stability, innovation, and security. We are committed to advancing industries by tackling big challenges such as resource use, efficiency, and sustainability. We are a thriving global workforce of 3000+ employees, with a 850+ strong team based in Mangaluru, India. We have a people first culture fostering innovation, collaboration and continuous learning Join us in creating software that works for people, not software that makes people work. Visit our career page to meet some of your future colleagues, explore our culture, and watch our video "We Make a Difference". We are looking for an experienced and independent L3-Engineer with a deep understanding of Grafana and Prometheus to join our CloudOps team . In this role, you will be responsible for maintaining, optimizing, and advancing our monitoring and observability systems. Your expertise will be critical in ensuring the reliability, performance, and scalability of our infrastructure. You will be owning the overall health, availability and configurations of Grafana and Prometheus solutions based on Kubernetes cluster. Are you ready to be part of an amazing team of 20 engineers who are the backbone of our operations? At CloudOps, we manage, support, and troubleshoot over 3000 servers and our large iPower farm that runs over 150 different products. Our mission is to ensure our IT platforms operate smoothly with minimal downtime. Were looking for passionate individuals to join our dynamic and energetic team. If you thrive in a fast-paced environment and love solving complex challenges, CloudOps is the place for you! Key responsibilities: Grafana and Prometheus Administration for more than 3,000 VMs and services Configure, maintain, and scale Grafana and Prometheus instances. Develop and implement custom dashboards for monitoring key metrics. Troubleshoot issues, ensure data accuracy, and optimize query performance. Adding new hosts and services, reconfiguring existing ones Monitoring and Alerting: Design and manage alerting rules for proactive issue identification and resolution. Continuously improve and expand monitoring coverage to meet evolving needs. Collaborate with teams to define alert thresholds and escalation procedures. Data Analysis and Visualization: Analyze metrics data to identify performance bottlenecks and areas for improvement. Create meaningful visualizations and reports to provide insights for stakeholders (infrastructure and Business Units) Scaling and Optimization: Collaborate with the CloudOps Infrastructure Team to ensure seamless integration and scalability of Grafana and Prometheus. Fine-tune configurations to achieve optimal resource utilization and performance. Proven 3Y experience as an L3 Engineer specializing in Grafana and Prometheus administration. Proficiency in creating custom Grafana dashboards and queries. Strong understanding of monitoring best practices, alerting, and data analysis. Scripting and automation skills for efficient system management. Technical skills: Practical knowledge of Prometheus, Thanos, Grafana and Alertmanager tools, including windows_exporter, node_exporter, mssql_exporter, postgres_exporter, and Grafana Alloy, and others. Knowledge of Windows Server (versions from 2008) and Linux operating systems (Ubuntu, Alma, ContOS, FlatCar, MicroOS, Suse). Knowledge of network protocols. Basic knowledge of MS SQL, MySQL, IBM DB2, MongoDB, and MariaDB databases. Basic knowledge of IaaS and Azure/AWS services. Knowledge of Bash and Powershell scripting tools. What can you expect from us: A professional and business-driven environment with lots of exciting projects Super talented and committed colleagues who know that they only get better through collaboration and active knowledge sharing. Possibility of personal and professional development Targeted training courses in our EG Academy Best in industry employee benefits