Posted:14 hours ago|
Platform:
On-site
Full Time
Job Specification: AI Platform Engineer
About the Role
We are seeking an AI Platform Engineer to build and scale the infrastructure that powers
our production AI services. You will take cutting-edge models—ranging from speech
recognition (ASR) to large language models (LLMs)—and deploy them into highly
available, developer-friendly APIs.
You will be responsible for creating the bridge between the R&D team, who train models,
and the applications that consume them. This means developing robust APIs, deploying
and optimizing models on Triton Inference Server (or similar frameworks), and ensuring
real-time, scalable inference.
Responsibilities
● API Development
○ Design, build, and maintain production-ready APIs for speech, language, and
other AI models.
○ Provide SDKs and documentation to enable easy developer adoption.
● Model Deployment
○ Deploy models (ASR, LLM, and others) using Triton Inference Server or
similar systems.
○ Optimize inference pipelines for low-latency, high-throughput workloads.
● Scalability & Reliability
○ Architect infrastructure for handling large-scale, concurrent inference
requests.
○ Implement monitoring, logging, and auto-scaling for deployed services.
● Collaboration
○ Work with research teams to productionize new models.
○ Partner with application teams to deliver AI functionality seamlessly through
APIs.
● DevOps & Infrastructure
○ Automate CI/CD pipelines for models and APIs.
○ Manage GPU-based infrastructure in cloud or hybrid environments.
Requirements
● Core Skills
○ Strong programming experience in Python (FastAPI, Flask) and/or
Go/Node.js for API services.
○ Hands-on experience with model deployment using Triton Inference Server,
TorchServe, or similar.
○ Familiarity with both ASR frameworks and LLM frameworks (Hugging
Face Transformers, TensorRT-LLM, vLLM, etc.).
● Infrastructure
○ Experience with Docker, Kubernetes, and managing GPU-accelerated
workloads.
○ Deep knowledge of real-time inference systems (REST, gRPC, WebSockets,
streaming).
○ Cloud experience (AWS, GCP, Azure).
● Bonus
○ Experience with model optimization (quantization, distillation, TensorRT,
ONNX).
○ Exposure to MLOps tools for deployment and monitoring
Job Types: Full-time, Permanent
Pay: From ₹50,000.00 per month
Experience:
Work Location: In person
Guardian management services
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Practice Python coding challenges to boost your skills
Start Practicing Python Now6.0 - 6.0 Lacs P.A.
6.0 - 6.0 Lacs P.A.