Job
Description
About The Role
Project Role :Solution Architect
Project Role Description :Translate client requirements into differentiated, deliverable solutions using in-depth knowledge of a technology, function, or platform. Collaborate with the Sales Pursuit and Delivery Teams to develop a winnable and deliverable solution that underpins the client value proposition and business case.
Must have skills :Grafana
Good to have skills :NA
Minimum 3 year(s) of experience is required
Educational Qualification :15 years full time education
Job Overview:We are looking for a skilled Prometheus Implementation and Support Engineer to join our IT/DevOps team. The successful candidate will be responsible for the deployment, configuration, and ongoing support of the Prometheus monitoring and alerting platform. This role requires strong technical expertise, problem-solving skills, and the ability to work effectively with various teams to ensure the performance and reliability of our infrastructure and applications.Key Responsibilities:Prometheus Implementation:oPlan, design, and execute the deployment of Prometheus across multiple environments.oConfigure Prometheus to monitor key performance metrics, including custom dashboards, alerts, and data sources.oIntegrate Prometheus with Grafana for enhanced visualization and monitoring.Support and Maintenance:oProvide ongoing support for the Prometheus platform, ensuring continuous monitoring and optimal performance.oTroubleshoot and resolve issues related to Prometheus configurations, data collection, and alerting.oPerform regular upgrades and maintenance of the Prometheus platform.Monitoring and Optimization:oMonitor infrastructure and application performance in real-time, identifying and diagnosing performance issues.oWork with development, operations, and infrastructure teams to implement performance improvements and optimizations.oConduct root cause analysis of performance issues and provide actionable recommendations.Collaboration and Training:oCollaborate with cross-functional teams (development, QA, operations) to integrate Prometheus into the monitoring and alerting lifecycle.oTrain and support team members in the use of Prometheus for monitoring and troubleshooting.Reporting and Documentation:oGenerate and distribute regular reports on infrastructure and application performance.oMaintain comprehensive documentation of Prometheus configurations, monitoring setups, and troubleshooting procedures.Continuous Improvement:oStay current with the latest Prometheus features, best practices, and industry trends.oProactively suggest enhancements to improve monitoring capabilities and performance.Qualifications:Education:Experience:oMinimum [X] years of experience in performance monitoring and management, specifically with Prometheus.oProven experience in the end-to-end implementation and support of Prometheus in a complex environment.Technical Skills:
oStrong knowledge of Prometheus, including installation, configuration, and customization.oExperience with related technologies such as Grafana, Alertmanager, and exporters.oProficiency in programming or scripting languages such as Python, Shell, or PowerShell.oFamiliarity with cloud environments and container orchestration tools (e.g., Kubernetes, Docker).oUnderstanding of network protocols and application architectures.oApplication and Infrastructure Monitoring - Expertise in monitoring applications and underlying infrastructureoCloud Platform Knowledge:Understanding of cloud platforms (e.g., AWS, Azure, GCP) and their specific metrics for effective cloud monitoring.oAbility to diagnose and resolve performance issues and conduct thorough RCA.oCustomization and Configuration:Proficiency in customizing dashboards, alerts, and reports to meet specific business and technical requirements.oScripting and Automation:Knowledge of scripting languages (e.g., Python, PowerShell) to automate tasks and enhance monitoring capabilities.oIntegration Other Tools with GrafanaoSecurity and Compliance AwarenessSoft
Skills:oExcellent problem-solving and analytical skills.oStrong communication and collaboration skills.oAbility to work independently and manage multiple tasks effectively.oAttention to detail and a proactive approach to identifying and resolving issues.Preferred
Qualifications:Prometheus certification(s).Experience with other monitoring tools like Grafana, Nagios, or Zabbix.Knowledge of DevOps practices and tools.Experience in performance testing and tuning. Qualification 15 years full time education