Posted:1 week ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

SDE 2 / SDE 3 – AI Infrastructure & LLM Systems Engineer

Location:

Pune / Bangalore (India)

Experience:

4–8 years

Compensation:

no bar for the right candidate

Bonus:

Up to 10% of base

About The Company

AbleCredit builds

production-grade AI systems for BFSI enterprises

, reducing OPEX by up to 70% across onboarding, credit, collections, and claims.We

run our own LLMs on GPUs

, operate high-concurrency inference systems, and build AI workflows that must scale reliably under real enterprise traffic.

Role Summary (What We’re Really Hiring For)

We are looking for a

strong backend / systems engineer

who can:
  • Deploy AI models on GPUs
  • Expose them via APIs
  • Scale inference under high parallel load using async systems and queues
This is

not

a prompt-engineering or UI-AI role.

Core Responsibilities

  • Deploy and operate LLMs on GPU infrastructure (cloud or on-prem).
  • Run inference servers such as vLLM / TGI / SGLang / Triton or equivalents.
  • Build FastAPI / gRPC APIs on top of AI models.
  • Design async, queue-based execution for AI workflows (fan-out, retries, backpressure).
  • Plan and reason about capacity & scaling:
  • GPU count vs RPS
  • batching vs latency
  • cost vs throughput
  • Add observability around latency, GPU usage, queue depth, failures.
  • Work closely with AI researchers to productionize models safely.

Must-Have Skills

  • Strong backend engineering fundamentals (distributed systems, async workflows).
  • Hands-on experience running GPU workloads in production.
  • Proficiency in Python (Golang acceptable).
  • Experience with Docker + Kubernetes (or equivalent).
  • Practical knowledge of queues / workers (Redis, Kafka, SQS, Celery, Temporal, etc.).
  • Ability to reason quantitatively about performance, reliability, and cost.

Strong Signals (Recruiter Screening Clues)

Look For Candidates Who Have

  • Personally deployed models on GPUs
  • Debugged GPU memory / latency / throughput issues
  • Scaled compute-heavy backends under load
  • Designed async systems instead of blocking APIs

Nice to Have

  • Familiarity with LangChain / LlamaIndex (as infra layers, not just usage).
  • Experience with vector DBs (Qdrant, Pinecone, Weaviate).
  • Prior work on multi-tenant enterprise systems.

Not a Fit If

  • Only experience is calling OpenAI / Anthropic APIs.
  • Primarily a prompt engineer or frontend-focused AI dev.
  • No hands-on ownership of infra, scaling, or production reliability.
Skills:- Large Language Models (LLM), LLMops, Generative AI and Large Language Models (LLM) tuning

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You

hyderabad, telangana, india

Hyderabad, Telangana, India

Hyderabad, Telangana, India