Platform Engineer ( Site Reliability DevOps Engineer)

8 - 13 years

25 - 30 Lacs

Posted:1 day ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description


Responsibilities/What You ll Do:


Platform Design and Architecture:

building and operating a highly available, scalable, modular AI platform using technologies such as Qdrant, Anyscale, and Ray to support LLM orchestration, vector search, and multi-agent frameworks.
Core Infrastructure Development: Build essential APIs and infrastructure to power conver ional applications, AI agents, and analytics tools.
LLM Operational Solutions: Implement workflows for Large Language Models, including inference pipelines, fine tuning, caching, and evaluation for open-weight and hosted models.

Deployment & Performance Optimization: Deploy AI services on AWS with Kubernetes (EKS), Lambda, and ECS, ensuring scalability and resilience while optimizing vector databases and model runtimes for cost and performance.
Collaboration, Governance, & Mentorship: Partner with engineering teams, research teams to deliver production grade, self-healing, and performance-optimized services for AI/RAG pipelines , establish governance/security standards, and mentoring ior engineers in AI infrastructure best practices & reviews.

What We re Looking for (Minimum Qualifications):

8+ years of experience as Platform Engineer ( Site Reliability / DevOps Engineer) , with at least 3+ years in AI/ML platform development ( MLOps ).
Deep expertise in Python, with strong design and debugging skills.
Ability to work independently and lead complex projects with Excellent problem-solving, analytical, and communication skills.
Proficiency working with cloud platforms such as AWS, GCP, or Azure and familiarity with MLOps/AI DevOps tools like MLflow or Kubeflow, proficient in CI/CD , infrastructure as code (Terraform / CloudFormation).
Hands-on expertise with CI/CD pipelines, model observability, and incident response for AI/ML services.

Preferred Qualification:

Experience implementing and optimizing Platforms supporting large language model (LLM) pipelines with frameworks such as LangChain, LlamaIndex, Hugging Face Transformers, or similar.
Hands-on knowledge of Scaling & Setting up Vector DB platforms such as Qdrant (or other vector DBs like Pinecone, Weaviate) for semantic search and embeddings management.
Exposure to MLOps tools, Ray.io , Anyscale or other distributed orchestration & inference frameworks.
Experience with developing and deploying containerized applications using Docker and Kubernetes, including Helm charts and automated scaling.
Understanding of LLMOps patterns model registry, prompt versioning, and feedback loop

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You