Jobs

Interviews
Job Alerts
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Home
>
Jobs in Noida
>
INTELLI SEARCH
>
Software Architect Generative AI & LLM Systems

Software Architect Generative AI & LLM Systems

INTELLI SEARCH

7 - 12 years

20 - 35 Lacs

Noida Hyderabad Gurugram

Posted:5 months ago| Platform:

Apply

Skills Required

generative ai Architecture Fullstack Development langchain llm Cloud GPU

Work Mode

Hybrid

Job Type

Full Time

Job Description

Software Architect Generative AI & LLM Systems job location Hyderabad - Noida or Gurgaon Job Overview We are seeking a highly experienced and hands-on Software Architect to lead the design and deployment of Large Language Model (LLM)-powered applications across cloud and on-prem environments. This role demands deep expertise in full-stack software development, high-performance inference systems, and cutting-edge generative AI workflows. You will play a key role in scaling AI infrastructure, maximizing throughput, and educating cross-functional teams on best practices for building LLM-driven solutions. Key Responsibilities LLM Deployment & Infrastructure Design: Architect, deploy, and maintain LLMs on cloud-based GPU clusters (e.g., AWS, GCP, Azure) or on-premise hardware including NVIDIA HGX and smaller GPU-accelerated instances. Bonus points for experience deploying containerized LLM applications in GPU clusters. Performance Optimization on Software Layer: Optimize LLM serving stacks using frameworks such as vLLM, TensorRT-LLM, or DeepSpeed to improve inference throughput and reduce time-to-first-token latency. Prompt Engineering & Optimization: Design, test, and refine prompts for LLMs to extract the highest quality output. Mentor team members on prompt engineering strategies and few-shot examples. I nference Efficiency & Scalability: Architect systems to maximize low-latency performance and time-to-first-token even under high demand. GenAI Application Architectu re: Build and lead GenAI application development using Langchain, designing modular pipelines for agents, tools, and memory systems. Define architectural patterns and reusable workflows. Team Enablement & Education: Educate and upskill engineering teams on best practices in GenAI development, inference performance, and prompt design through documentation, workshops, and code reviews. RAG with SQL-based Systems: Design and implement retrieval-augmented generation (RAG) pipelines that leverage SQL-like structured databases for high-relevance grounding. Vector Database Integration (Nice-to-Have): Bonus: Architect and optimize RAG systems using vector embeddings and specialized vector databases such as FAISS, Weaviate, or Pinecone. Requirements Must-Have Skills: 7+ years of full-stack development and software architecture experience Proven track record deploying LLMs in production, both on-premise and cloud GPU environments Strong hands-on experience with v LLM, Langchain, and model serving performance tuning Deep knowledge of prompt engineering, token economy, and optimizing LLM behavior Experience designing and scaling inference pipelines for latency and throughput Strong experience with Python and either TypeScript or Golan g Familiarity with deploying applications to hyperscalers (AWS, GCP, Azure) Strong knowledge of SQL databases and data retrieval strategies for grounding LLM responses Nice-to-Have Skills: Experience with vector databases and embedding-based retrieval in RAG pipelines Experience with orchestrating containerized LLM deployments using Kubernetes or Ray Familiarity with streaming inference systems and token-by-token UX optimizations Background in AI/ML systems, MLOps, or research-to-prod workflows conact 95134 87487

More Jobs at INTELLI SEARCH

Software Architect Online Travel Agency OTA Portal

Gurugram

12 - 22 yrs

INR 50 - 70 Lacs

Senior Quality Engineer/ Quality Manager

Bhiwadi

10 - 20 yrs

INR 18 - 27 Lacs

Business Development Representative UK Shift WFO

Bengaluru

3 - 8 yrs

INR 5 - 10 Lacs

Data Architect (Solution Architect) - Gurgaon/Noida

Noida, Gurugram

8 - 13 yrs

INR 40 - 45 Lacs

Data Engineering Manager-Gurgaon/Noida/Hyderabad

Noida, Hyderabad, Gurugram

12 - 19 yrs

INR 45 - 55 Lacs

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

INTELLI SEARCH

Technology / Data Analytics

Innovation City

Login to

Please Verify Your Phone or Email

Confirm Action

Software Architect Generative AI & LLM Systems