Sr/Lead ML Engineer

10 - 12 years

0 Lacs

Posted:1 day ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Contractual

Job Description

Sr/Lead ML Engineer

Placement type (FTE/C/CTH): C/CTH

Duration : 6 month with extension

Location: Phoenix AZ, must be onsite 5 days a week

Start Date: 2 weeks from the offer


Interview Process

 

 One and done

 

Reason for position

 

 Integration ML to the Observability Grafana platform

 

Team Overview

 

Onshore and offshore

 

Project Description

 

AI/ML for Observability (AIOps)

 

Developed machine learning and deep learning solutions for observability data to enhance IT operations. Implemented time series forecasting, anomaly detection, and event correlation models. Integrated LLMs using prompt engineering, fine-tuning, and RAG for incident summarization. Built MCP client-server architecture for seamless integration with the Grafana ecosystem.

 

Duties/Day to Day Overview

 

 Machine Learning & Model Development

 

Design and develop ML/DL models for:

Time series forecasting (e.g., system load, CPU/memory usage)

Anomaly detection in logs, metrics, or traces

Event classification and correlation to reduce alert noise

Select, train, and tune models using frameworks like TensorFlow, PyTorch, or scikit-learn

Evaluate model performance using metrics like precision, recall, F1-score, and AUC

ML Pipeline Engineering

 

Build scalable data pipelines for training and inference (batch or streaming)

Preprocess large observability datasets from tools like Prometheus, Kafka, or BigQuery

Deploy models using cloud-native services (e.g., GCP Vertex AI, Azure ML, Docker/Kubernetes)

Maintain retraining pipelines and monitor for model drift

LLM Integration for Observability Intelligence

 

Implement LLM-based workflows for summarizing incidents or logs

Develop and refine prompts for GPT, LLaMA, or other large language models

Integrate Retrieval-Augmented Generation (RAG) with vector databases (e.g., FAISS, Pinecone)

Control latency, hallucinations, and cost in production LLM pipelines

Grafana & MCP Ecosystem Integration

 

Build or extend MCP client/server components for Grafana

Surface ML model outputs (e.g., anomaly scores, predictions) in observability dashboards

Collaborate with observability engineers to integrate ML insights into existing monitoring tools

Collaboration & Agile Delivery

 

Participate in daily stand-ups, sprint planning, and retrospectives

Collaborate with:

Data engineers on pipeline performance and data ingestion

Frontend developers for real-time data visualizations

SRE and DevOps teams for alert tuning and feedback loop integration

Translate model outputs into actionable insights for platform teams

Testing, Documentation & Version Control

 

Write unit, integration, and regression tests for ML code and pipelines

Maintain documentation on models, data sources, assumptions, and APIs

Use Git, CI/CD pipelines, and model versioning tools (e.g., MLflow, DVC)

Top Requirements

 

(Must haves)

AI ML Engineer Skills

 

Design and develop machine learning algorithms and deep learning applications and systems for Observability data (AIOps)

Hands on experience in Time series forecasting/prediction, anomaly detection ML algorithms

Hands on experience in event classification and correlation ML algorithms

Hands on experience on integrating with LLMs with prompt/fine-tuning/rag for effective summarization

Working knowledge on implementing MCP client and server for Grafana Eco-system or similar exposure

 

Key Skills:

 

Programming languages: Python, R

ML Frameworks: TensorFlow, PyTorch, scikit-learn

Cloud platforms: Google Cloud, Azure

Front-End Frameworks/Libraries: Experience with frameworks like React, Angular, or Vue.js, and libraries like jQuery.

Design Tools: Proficiency in design software like Figma, Adobe XD, or Sketch.

Databases: Knowledge of database technologies like MySQL, MongoDB, or PostgreSQL.

Server-Side Languages: Familiarity with server-side languages like Python, Node.js, or Java.

Version Control: Experience with Git and other version control systems.

Testing: Knowledge of testing frameworks and methodologies.

Agile Development: Experience with agile development methodologies.

Communication and Collaboration: Strong communication and collaboration skills.

Experience: Lead – 10 to 12 Years (Onshore and Offshore). Developers - 6 to 8 Years for Engineers

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You