Inference Optimization Engineer(LLM and Runtime)

2 - 6 years

0 Lacs

Posted:4 days ago| Platform: Shine logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Role Overview: You are a highly skilled and innovative Inference Optimization (LLM and Runtime) at Sustainability Economics.ai, responsible for designing, developing, and optimizing cutting-edge AI systems. Your role involves integrating AI-driven cloud solutions with sustainable energy to create scalable, intelligent ecosystems that drive efficiency and innovation across industries. Collaborating with cross-functional teams, you will build production-ready AI solutions that address real-world business challenges and keep the platforms at the forefront of AI innovation. Key Responsibility: - Optimization and customization of large-scale generative models (LLMs) for efficient inference and serving. - Applying and evaluating advanced model optimization techniques such as quantization, pruning, distillation, tensor parallelism, caching strategies, etc., to enhance model efficiency, throughput, and inference performance. - Implementing custom fine-tuning pipelines using parameter-efficient methods (LoRA, QLoRA, adapters etc.) to achieve task-specific goals while minimizing compute overhead. - Optimizing runtime performance of inference stacks using frameworks like vLLM, TensorRT-LLM, DeepSpeed-Inference, and Hugging Face Accelerate. - Designing and implementing scalable model-serving architectures on GPU clusters and cloud infrastructure (AWS, GCP, or Azure). - Working closely with platform and infrastructure teams to reduce latency, memory footprint, and cost-per-token during production inference. - Evaluating hardwaresoftware co-optimization strategies across GPUs (NVIDIA A100/H100), TPUs, or custom accelerators. - Monitoring and profiling performance using tools such as Nsight, PyTorch Profiler, and Triton Metrics to drive continuous improvement. Qualification Required: - Ph.D. in Computer Science or a related field, with a specialization in Deep Learning, Generative AI, or Artificial Intelligence and Machine Learning (AI/ML). - 2-3 years of hands-on experience in large language model (LLM) or deep learning optimization, gained through academic or industry work. Additional Company Details: Sustainability Economics.ai is a global organization pioneering the convergence of clean energy and AI. Guided by exceptional leaders, the company is committed to making long-term efforts to fulfill its vision through technical innovation, client services, expertise, and capability expansion. Joining the team at Sustainability Economics.ai offers you the opportunity to shape a first-of-its-kind AI + clean energy platform, work with a mission-driven team obsessed with impact, and leave your mark at the intersection of AI and sustainability.,

Mock Interview

Practice Video Interview with JobPe AI

Start Machine Learning Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You