4 - 6 years
9 - 12 Lacs
Posted:1 day ago|
Platform:
Work from Office
Full Time
Candidate will be responsible for the maintenance, optimization, and day-to-day operations of open-source observability platform. Must have expertise in Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana), and OpenTelemetry to ensure the health, performance, and reliability of our critical systems and applications. Will play a key role in designing, configuring, and implementing observability solutions, triaging and resolving observability-related issues, developing custom dashboards and alerts, and collaborating with development and operations teams to enhance our monitoring capabilities.
Platform Management & Maintenance:
* Administer, maintain, and optimize existing Prometheus, Grafana, and ELK Stack deployments, ensuring high availability and performance.
* Perform regular upgrades, patching, and configuration management of observability tools.
* Monitor the health and performance of the observability infrastructure itself, proactively identifying and resolving issues.
* Manage data retention, storage, and archiving strategies for metrics, logs, and traces.
Monitoring & Alerting:
* Design, configure, and implement monitoring solutions using Prometheus and Grafana for various applications, services, and infrastructure components.
* Develop and refine PromQL queries to extract meaningful insights from time-series data.
* Configure and manage alerting rules in Prometheus Alertmanager and Grafana to ensure timely notification of critical events.
* Collaborate with development teams to define appropriate metrics, logging standards, and tracing instrumentation.
Logging & Tracing:
* Manage and optimize ELK Stack for centralized log aggregation, analysis, and visualization.
* Configure and implement Logstash pipelines for efficient data ingestion and transformation.
* Develop Kibana dashboards and searches for effective log correlation and troubleshooting.
* Design, implement, and manage distributed tracing solutions using OpenTelemetry, ensuring end-to-end visibility across microservices.
* Assist development teams in adopting OpenTelemetry for comprehensive application instrumentation.
Troubleshooting & Support (L2 Focus):
* Serve as an L2 escalation point for observability-related incidents, performing root cause analysis and implementing solutions.
* Debug and resolve issues related to data collection, processing, visualization, and alerting.
* Provide guidance and support to development and operations teams on how to effectively use observability tools for troubleshooting and performance analysis.
* Create and maintain comprehensive documentation, runbooks, and troubleshooting guides.
Dashboarding & Visualization:
* Develop, customize, and maintain Grafana dashboards to provide actionable insights into system performance, application health, and business metrics.
* Create meaningful visualizations and reports for various stakeholders.
Collaboration & Improvement:
* Work closely with SRE, DevOps, Development, and Infrastructure teams to integrate observability best practices throughout the software development lifecycle.
* Participate in on-call rotations as needed to support critical observability infrastructure.
* Continuously research and evaluate new open-source observability tools and technologies to improve our capabilities.
* Contribute to the automation of observability tasks and workflows.
Required Skills and Experience:
* Communication:
* Strong communication and interpersonal skills, with the ability to collaborate effectively with technical and non-technical stakeholders.
Nice to Have:
* Experience with Infrastructure as Code (IaC) tools like Terraform or Ansible for managing observability infrastructure.
* Familiarity with other observability tools like Loki, Tempo, Jaeger, or similar.
* Understanding of ITIL processes and incident management.
* Experience with CI/CD pipelines and integrating observability into the deployment process.
Indian Financial Technology And Alliedservices
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
9.0 - 12.0 Lacs P.A.
5.0 - 9.0 Lacs P.A.
8.0 - 18.0 Lacs P.A.
Vadodara
5.0 - 13.0 Lacs P.A.
4.0 - 8.0 Lacs P.A.
5.0 - 15.0 Lacs P.A.
4.0 - 9.0 Lacs P.A.
12.0 - 22.0 Lacs P.A.
9.0 - 12.0 Lacs P.A.
Hyderabad, Pune, Bengaluru
0.6 - 2.0 Lacs P.A.