We are looking for a dynamic and passionate hands-on senior contributor to join Intel's AI Group.
Day-to-day work involves contributing to open-source AI frameworks such as PyTorch and inference-serving frameworks like vLLM and SGLang. The role includes designing, developing, and optimizing features for Intel's AI framework software stack for Intel's AI accelerators and next-generation GPUs.
Roles and Responsibilities include:
- Design and develop software features for AI frameworksboth hardware-agnostic and hardware-aware.
- Enhance and extend deep learning inference and training capabilities in the software stack.
- Analyze and architect state-of-the-art features across different frameworks and drive development across the full software stack.
- Identify optimization opportunities in the software stack to improve the performance of deep learning workloads.
- Participate in discussions with the open-source community, contribute to development, and upstream software enhancements.
Qualifications: - B.Tech or M.S./M.Tech in CS, ECE, or related fields with 612 years of overall experience.
- Proficient in Python-based complex software implementations; intermediate knowledge of advanced C++ (C++14/17) and parallel programming.
- In-depth, hands-on experience with frameworks such as PyTorch, vLLM, and SGLang.
- Experience with advanced inference-serving features such as disaggregated serving, quantization, speculative decoding, and constrained decoding.
- Strong understanding of LLMs
- Practical knowledge of deep learning models for image and video generation is desirable.
- Ability to debug complex issues in multi-layered software systems; understanding of software integration in large open-source frameworks.
- Strong understanding of computer architecture and HW-SW optimization techniques.
- Effective communication skills and experience working in cross-geo teams.
- Ability to perform performance analysis of code on both host and accelerators/GPUs using open-source and proprietary profilers.
- Understanding of the competitive landscape for technologies in this domain.
Preferred
- Experience developing and integrating CUTLASS or Triton-based kernels for deep learning.
- Knowledge of compiler algorithms for heterogeneous systems and fuser optimizations.