Posted:1 day ago|
Platform:
On-site
Contractual
One and done
Integration ML to the Observability Grafana platform
Onshore and offshore
Developed machine learning and deep learning solutions for observability data to enhance IT operations. Implemented time series forecasting, anomaly detection, and event correlation models. Integrated LLMs using prompt engineering, fine-tuning, and RAG for incident summarization. Built MCP client-server architecture for seamless integration with the Grafana ecosystem.
Duties/Day to Day Overview
Machine Learning & Model Development
Design and develop ML/DL models for:
Time series forecasting (e.g., system load, CPU/memory usage)
Anomaly detection in logs, metrics, or traces
Event classification and correlation to reduce alert noise
Select, train, and tune models using frameworks like TensorFlow, PyTorch, or scikit-learn
Evaluate model performance using metrics like precision, recall, F1-score, and AUC
ML Pipeline Engineering
Build scalable data pipelines for training and inference (batch or streaming)
Preprocess large observability datasets from tools like Prometheus, Kafka, or BigQuery
Deploy models using cloud-native services (e.g., GCP Vertex AI, Azure ML, Docker/Kubernetes)
Maintain retraining pipelines and monitor for model drift
LLM Integration for Observability Intelligence
Implement LLM-based workflows for summarizing incidents or logs
Develop and refine prompts for GPT, LLaMA, or other large language models
Integrate Retrieval-Augmented Generation (RAG) with vector databases (e.g., FAISS, Pinecone)
Control latency, hallucinations, and cost in production LLM pipelines
Grafana & MCP Ecosystem Integration
Build or extend MCP client/server components for Grafana
Surface ML model outputs (e.g., anomaly scores, predictions) in observability dashboards
Collaborate with observability engineers to integrate ML insights into existing monitoring tools
Collaboration & Agile Delivery
Participate in daily stand-ups, sprint planning, and retrospectives
Collaborate with:
Data engineers on pipeline performance and data ingestion
Frontend developers for real-time data visualizations
SRE and DevOps teams for alert tuning and feedback loop integration
Translate model outputs into actionable insights for platform teams
Testing, Documentation & Version Control
Write unit, integration, and regression tests for ML code and pipelines
Maintain documentation on models, data sources, assumptions, and APIs
Use Git, CI/CD pipelines, and model versioning tools (e.g., MLflow, DVC)
Top Requirements
Design and develop machine learning algorithms and deep learning applications and systems for Observability data (AIOps)
Hands on experience in Time series forecasting/prediction, anomaly detection ML algorithms
Hands on experience in event classification and correlation ML algorithms
Hands on experience on integrating with LLMs with prompt/fine-tuning/rag for effective summarization
Working knowledge on implementing MCP client and server for Grafana Eco-system or similar exposure
Programming languages: Python, R
ML Frameworks: TensorFlow, PyTorch, scikit-learn
Cloud platforms: Google Cloud, Azure
Front-End Frameworks/Libraries: Experience with frameworks like React, Angular, or Vue.js, and libraries like jQuery.
Design Tools: Proficiency in design software like Figma, Adobe XD, or Sketch.
Databases: Knowledge of database technologies like MySQL, MongoDB, or PostgreSQL.
Server-Side Languages: Familiarity with server-side languages like Python, Node.js, or Java.
Version Control: Experience with Git and other version control systems.
Testing: Knowledge of testing frameworks and methodologies.
Agile Development: Experience with agile development methodologies.
Communication and Collaboration: Strong communication and collaboration skills.
Experience: Lead – 10 to 12 Years (Onshore and Offshore). Developers - 6 to 8 Years for Engineers
Stier Solutions Inc
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Practice Python coding challenges to boost your skills
Start Practicing Python NowDelhi, India
Salary: Not disclosed
Delhi, India
Salary: Not disclosed