Jobs

Interviews
Job Alerts
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Jobs

Interviews

Home
>
Jobs in sān
>
FullThrottle Labs Pvt Ltd
>
Machine Learning Engineer

Machine Learning Engineer

FullThrottle Labs Pvt Ltd

3 years

1 - 2 Lacs

sān

Posted:9 hours ago| Platform: GlassDoor logo

Apply

Skills Required

learning inference network ai design optimization model latency quantization decoding pytorch cuda ml engineering documentation writing python docker kubernetes programming scalability vision compensation healthcare

Work Mode

On-site

Job Type

Full Time

Job Description

We are building a distributed LLM inference network that combines idle GPU capacity from around the world into a single cohesive plane of compute that can be used for running large-language models like DeepSeek and Llama 4. At any given moment, we have over 5,000 GPUs and hundreds of terabytes of VRAM connected to the network.

We are a small, well-funded team working on difficult, high-impact problems at the intersection of AI and distributed systems. We primarily work in-person from our office in downtown San Francisco.

**Responsibilities
**

- Design and implement optimization techniques to increase model throughput and reduce latency across our suite of models

- Deploy and maintain large language models at scale in production environments

- Deploy new models as they are released by frontier labs

- Implement techniques like quantization, speculative decoding, and KV cache reuse

- Contribute regularly to open source projects such as SGLang and vLLM

- Deep dive into underlying codebases of TensorRT, PyTorch, TensorRT-LLM, vLLM, SGLang, CUDA, and other libraries to debug ML performance issues

- Collaborate with the engineering team to bring new features and capabilities to our inference platform

- Develop robust and scalable infrastructure for AI model serving

- Create and maintain technical documentation for inference systems

**Requirements
**

- 3+ years of experience writing high-performance, production-quality code

- Strong proficiency with Python and deep learning frameworks, particularly PyTorch

- Demonstrated experience with LLM inference optimization techniques

- Hands-on experience with SGLang and vLLM, with contributions to these projects strongly preferred

- Familiarity with Docker and Kubernetes for containerized deployments

- Experience with CUDA programming and GPU optimization

- Strong understanding of distributed systems and scalability challenges

- Proven track record of optimizing AI models for production environments

Nice to Have

- Familiarity with TensorRT and TensorRT-LLM

- Knowledge of vision models and multimodal AI systems

- Experience implementing techniques like quantization and speculative decoding

- Contributions to open source machine learning projects

- Experience with large-scale distributed computing

**Compensation
**We offer competitive compensation, equity in a high-growth startup, and comprehensive benefits. The base salary range for this role is <span class="math inline](\(180,000 - \)</span>250,000, plus competitive equity and benefits including:

- Full healthcare coverage

- Quarterly offsites

- Flexible PTO

Job Type: Full-time

Pay: ₹180,000.00 - ₹250,000.00 per year

Work Location: In person

Speak with the employer
+91 9008078505

More Jobs at FullThrottle Labs Pvt Ltd

Sales Representative

Bengaluru, Karnataka

Experience: Not specified

INR 4 - 5 Lacs

Sales Executive

Bengaluru, Karnataka

Experience: Not specified

INR 8 - 15 Lacs

Sales Executive

Bengaluru, Karnataka

Experience: Not specified

INR 7 - 8 Lacs

SDE 1

Bengaluru, Karnataka

Experience: Not specified

Salary: Not disclosed

Content Lead

Bengaluru

5.0 - 5.0 yrs

INR 18 - 20 Lacs

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

FullThrottle Labs Pvt Ltd

Login to

Please Verify Your Phone or Email

Confirm Action

Search

Profile

Upskill and Grow with AI

Machine Learning Engineer