AI/ML Engineer (LLM & Observability)

3 - 7 years

0 Lacs

Posted:5 days ago| Platform: Shine logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

As an experienced AI/ML Engineer in our company, your role will involve designing and implementing intelligent systems using large language models (LLMs) for observability and operational workflows. You will focus on building AI-driven solutions for anomaly detection, alert triage, and automated root cause analysis in production environments. **Key Responsibilities:** - **LLM Development & Deployment** - Design and deploy large language model solutions using Ollama, AWS Bedrock, or similar platforms - Implement agentic networks and multi-agent systems for complex operational workflows - Develop context-based prompt engineering strategies for production LLM applications - Fine-tune and optimize LLMs using LoRA (Low-Rank Adaptation) techniques - Integrate Model Context Protocol (MCP) for enhanced agent communication - **Observability & Operations** - Build AI-driven observability workflows for monitoring and operational intelligence - Design and implement intelligent alerting systems that reduce noise and improve signal quality - Develop automated alert triage mechanisms using ML-based classification - Create root cause analysis systems that leverage LLMs for faster incident resolution - Integrate LLM capabilities into existing operational pipelines and monitoring infrastructure - **System Integration & Optimization** - Develop robust APIs and integrations for LLM-powered features - Optimize model performance for latency-sensitive operational use cases - Implement monitoring and evaluation frameworks for LLM outputs - Build feedback loops to continuously improve model accuracy and relevance **Required Qualifications:** - **Technical Skills** - Programming: Strong proficiency in Python with experience in production-grade code - ML Frameworks: Hands-on experience with PyTorch, TensorFlow, LangChain, or similar frameworks - LLM Deployment: Practical experience deploying LLMs using platforms like Ollama, AWS Bedrock, OpenAI API, Anthropic Claude API, or similar - Prompt Engineering: Proven ability to design effective prompts and context strategies - Version Control: Git/GitHub proficiency - **AI/ML Experience** - 3+ years of experience in machine learning or AI engineering - Demonstrated experience working with large language models in production - Understanding of transformer architectures and attention mechanisms - Experience with model evaluation, testing, and performance optimization - Familiarity with vector databases and semantic search (Pinecone, Weaviate, ChromaDB, etc.) - **System Architecture** - Experience building scalable ML systems and APIs - Understanding of microservices architecture and containerization (Docker, Kubernetes) - Familiarity with cloud platforms (AWS, GCP, or Azure) - Knowledge of CI/CD pipelines for ML models **Additional Details:** This section is omitted as there are no additional details about the company mentioned in the job description provided. This Full-time role offers benefits including food provision, health insurance, life insurance, paid sick time, and paid time off. If you join us, you will work on near-term projects such as deploying LLM-powered alert triage systems, building intelligent incident summarization using context-aware prompts, implementing anomaly detection for production metrics using ML models, and creating automated root cause analysis workflows. In the long term, you will be involved in developing self-healing systems that use LLMs for autonomous remediation, building predictive models for proactive issue detection, creating natural language interfaces for operational queries, and designing multi-agent systems for complex operational decision-making.,

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You