MLops Architect

9 - 14 years

25 - 40 Lacs

Posted:-1 days ago| Platform: Naukri logo

Apply

Work Mode

Hybrid

Job Type

Full Time

Job Description

Observability & MLOps Engineer

Primary Focus

Observability & ML Lifecycle Management

Core Responsibilities

- Design observability stack
- Implement distributed tracing - Build Grafana dashboards & alerts - Integrate telemetry across clouds

Core Skills

- Metrics, logs, traces
- Grafana & alerting - MLOps engineering - Python/Scripting

Good-to-Have

- Airflow basics
- Multi-cloud observability

Overlap

- Python/Scripting
- Cloud familiarity

Senior Observability Specialist.

Location: [Chennai, Pune, Bangalore]

Employment Type: Fulltime Experience Required: [15-18]

Job Summary:

We are seeking a highly skilled Senior Observability Specialist to design, implement, and manage endtoend observability strategies across cloud and on-premises environments. This role requires expertise in modern monitoring, logging, and tracing tools, ensuring system reliability, performance optimization, and proactive incident detection. The ideal candidate will have experience with Dynatrace, Datadog, and various opensource solutions, including Grafana, Loki, Tempo, Mimir, and Prometheus.

Key Responsibilities:

  • Design and implement fullstack observability architectures that provide seamless monitoring, logging, and tracing capabilities.
  • Define best practices for observability across hybrid cloud, multicloud, and onpremises environments.
  • Ensure scalability, availability, and resilience of monitoring solutions in hightraffic applications.

Monitoring & Dashboarding Architecture:

  • Architect Grafana-based observability platforms for real-time visualization and analysis of metrics.
  • Establish Prometheus-based metric collection pipelines optimized for high-volume environments.
  • Integrate Dynatrace and Datadog into cloud-native infrastructure for proactive monitoring.

Centralized Logging & Distributed Tracing:

  • Design and implement centralized logging solutions using Loki, ensuring efficient log ingestion, indexing, and querying.
  • Develop distributed tracing strategies with Tempo to enhance performance monitoring in microservices architectures.
  • Optimize Mimir-based metric storage for seamless data retrieval and scalability.

Observability Strategy & Automation:

  • Define and implement observability-driven DevOps methodologies to improve system reliability.
  • Lead automation initiatives for log analysis, alerting, and anomaly detection using machine learning models.
  • Architect automated alerting workflows using Prometheus Alertmanager, Dynatrace AI alerts, and Datadog event notifications.
  • Ensure efficient KPI tracking and proactive troubleshooting based on observability insights.

Scripting & API Integrations:

  • Develop custom API integrations using Python or Go to query, retrieve, and process monitoring data.
  • Architect event-driven observability pipelines for automated data collection and reporting.

DevOps & CI/CD Integration:

  • Collaborate with DevOps teams to integrate observability tooling within CI/CD pipelines.
  • Optimize system performance and resource utilization through proactive monitoring.
  • Advocate for best practices in observability driven software development.

Cloud-Native Observability & DevOps Alignment:

  • Design observability strategies tailored for Kubernetes-based microservices and cloud-native architectures.
  • Collaborate with DevOps teams to embed observability practices within CI/CD pipelines for continuous monitoring.
  • Optimize logging and metrics pipelines to support containerized and serverless environments.

Qualifications & Skills Architectural Focus • Strong expertise in designing observability frameworks across Dynatrace, Datadog, Grafana, Loki, Tempo, Mimir, and Prometheus. • Proficiency in observability architecture, ensuring scalable and reliable monitoring solutions. • Advanced experience in scripting with Python or Go for custom API integrations. • Deep understanding of DevOps methodologies, CI/CD best practices, and cloud-native observability tools. • Experience in microservices architecture and distributed systems monitoring. • Ability to troubleshoot bottlenecks, optimize performance, and implement predictive observability insights. Preferred Certifications (Optional): • Certified Kubernetes Administrator (CKA) • AWS Certified DevOps Engineer • Dynatrace Performance Monitoring Certification • Prometheus Certified Associate.

Mock Interview

Practice Video Interview with JobPe AI

Start Azure DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
Citiustech logo
Citiustech

IT Services and IT Consulting

Princeton NJ

RecommendedJobs for You

pune, chennai, bengaluru

pune, chennai, bengaluru

pune, chennai, bengaluru