Home
Jobs

Software Architect Generative AI & LLM Systems

7 - 12 years

20 - 35 Lacs

Posted:1 month ago| Platform: Naukri logo

Apply

Work Mode

Hybrid

Job Type

Full Time

Job Description

Software Architect Generative AI & LLM Systems job location Hyderabad - Noida or Gurgaon Job Overview We are seeking a highly experienced and hands-on Software Architect to lead the design and deployment of Large Language Model (LLM)-powered applications across cloud and on-prem environments. This role demands deep expertise in full-stack software development, high-performance inference systems, and cutting-edge generative AI workflows. You will play a key role in scaling AI infrastructure, maximizing throughput, and educating cross-functional teams on best practices for building LLM-driven solutions. Key Responsibilities LLM Deployment & Infrastructure Design: Architect, deploy, and maintain LLMs on cloud-based GPU clusters (e.g., AWS, GCP, Azure) or on-premise hardware including NVIDIA HGX and smaller GPU-accelerated instances. Bonus points for experience deploying containerized LLM applications in GPU clusters. Performance Optimization on Software Layer: Optimize LLM serving stacks using frameworks such as vLLM, TensorRT-LLM, or DeepSpeed to improve inference throughput and reduce time-to-first-token latency. Prompt Engineering & Optimization: Design, test, and refine prompts for LLMs to extract the highest quality output. Mentor team members on prompt engineering strategies and few-shot examples. I nference Efficiency & Scalability: Architect systems to maximize low-latency performance and time-to-first-token even under high demand. GenAI Application Architectu re: Build and lead GenAI application development using Langchain, designing modular pipelines for agents, tools, and memory systems. Define architectural patterns and reusable workflows. Team Enablement & Education: Educate and upskill engineering teams on best practices in GenAI development, inference performance, and prompt design through documentation, workshops, and code reviews. RAG with SQL-based Systems: Design and implement retrieval-augmented generation (RAG) pipelines that leverage SQL-like structured databases for high-relevance grounding. Vector Database Integration (Nice-to-Have): Bonus: Architect and optimize RAG systems using vector embeddings and specialized vector databases such as FAISS, Weaviate, or Pinecone. Requirements Must-Have Skills: 7+ years of full-stack development and software architecture experience Proven track record deploying LLMs in production, both on-premise and cloud GPU environments Strong hands-on experience with v LLM, Langchain, and model serving performance tuning Deep knowledge of prompt engineering, token economy, and optimizing LLM behavior Experience designing and scaling inference pipelines for latency and throughput Strong experience with Python and either TypeScript or Golan g Familiarity with deploying applications to hyperscalers (AWS, GCP, Azure) Strong knowledge of SQL databases and data retrieval strategies for grounding LLM responses Nice-to-Have Skills: Experience with vector databases and embedding-based retrieval in RAG pipelines Experience with orchestrating containerized LLM deployments using Kubernetes or Ray Familiarity with streaming inference systems and token-by-token UX optimizations Background in AI/ML systems, MLOps, or research-to-prod workflows conact 95134 87487

Mock Interview

Practice Video Interview with JobPe AI

Start Generative Ai Interview Now

My Connections INTELLI SEARCH

Download Chrome Extension (See your connection in the INTELLI SEARCH )

chrome image
Download Now
INTELLI SEARCH
INTELLI SEARCH

Technology / Data Analytics

Innovation City

50+ Employees

62 Jobs

    Key People

  • Jane Doe

    CEO
  • John Smith

    CTO

RecommendedJobs for You

Noida, Pune, Bengaluru