7 - 12 years
20 - 35 Lacs
Posted:1 month ago|
Platform:
Hybrid
Full Time
Software Architect Generative AI & LLM Systems job location Hyderabad - Noida or Gurgaon Job Overview We are seeking a highly experienced and hands-on Software Architect to lead the design and deployment of Large Language Model (LLM)-powered applications across cloud and on-prem environments. This role demands deep expertise in full-stack software development, high-performance inference systems, and cutting-edge generative AI workflows. You will play a key role in scaling AI infrastructure, maximizing throughput, and educating cross-functional teams on best practices for building LLM-driven solutions. Key Responsibilities LLM Deployment & Infrastructure Design: Architect, deploy, and maintain LLMs on cloud-based GPU clusters (e.g., AWS, GCP, Azure) or on-premise hardware including NVIDIA HGX and smaller GPU-accelerated instances. Bonus points for experience deploying containerized LLM applications in GPU clusters. Performance Optimization on Software Layer: Optimize LLM serving stacks using frameworks such as vLLM, TensorRT-LLM, or DeepSpeed to improve inference throughput and reduce time-to-first-token latency. Prompt Engineering & Optimization: Design, test, and refine prompts for LLMs to extract the highest quality output. Mentor team members on prompt engineering strategies and few-shot examples. I nference Efficiency & Scalability: Architect systems to maximize low-latency performance and time-to-first-token even under high demand. GenAI Application Architectu re: Build and lead GenAI application development using Langchain, designing modular pipelines for agents, tools, and memory systems. Define architectural patterns and reusable workflows. Team Enablement & Education: Educate and upskill engineering teams on best practices in GenAI development, inference performance, and prompt design through documentation, workshops, and code reviews. RAG with SQL-based Systems: Design and implement retrieval-augmented generation (RAG) pipelines that leverage SQL-like structured databases for high-relevance grounding. Vector Database Integration (Nice-to-Have): Bonus: Architect and optimize RAG systems using vector embeddings and specialized vector databases such as FAISS, Weaviate, or Pinecone. Requirements Must-Have Skills: 7+ years of full-stack development and software architecture experience Proven track record deploying LLMs in production, both on-premise and cloud GPU environments Strong hands-on experience with v LLM, Langchain, and model serving performance tuning Deep knowledge of prompt engineering, token economy, and optimizing LLM behavior Experience designing and scaling inference pipelines for latency and throughput Strong experience with Python and either TypeScript or Golan g Familiarity with deploying applications to hyperscalers (AWS, GCP, Azure) Strong knowledge of SQL databases and data retrieval strategies for grounding LLM responses Nice-to-Have Skills: Experience with vector databases and embedding-based retrieval in RAG pipelines Experience with orchestrating containerized LLM deployments using Kubernetes or Ray Familiarity with streaming inference systems and token-by-token UX optimizations Background in AI/ML systems, MLOps, or research-to-prod workflows conact 95134 87487
INTELLI SEARCH
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
My Connections INTELLI SEARCH
Noida, Hyderabad, Gurugram
20.0 - 35.0 Lacs P.A.
7.0 - 10.0 Lacs P.A.
37.5 - 45.0 Lacs P.A.
15.0 - 22.5 Lacs P.A.
Noida, Pune, Bengaluru
6.0 - 10.0 Lacs P.A.
25.0 - 30.0 Lacs P.A.
Bengaluru
10.0 - 12.0 Lacs P.A.
20.0 - 25.0 Lacs P.A.
Bengaluru
5.0 - 9.0 Lacs P.A.
30.0 - 35.0 Lacs P.A.