Job
Description
As a Qualcomm Systems Engineer, you will be responsible for researching, designing, developing, simulating, and validating systems-level software, hardware, architecture, algorithms, and solutions to enable the development of cutting-edge technology. You will collaborate across functional teams to meet and exceed system-level requirements and standards. Key Responsibilities: - Model Optimization & Quantization - Optimize deep learning models using quantization (INT8, INT4, mixed precision, etc.), pruning, and knowledge distillation. - Implement Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT) for deployment. - Familiarity with TensorRT, ONNX Runtime, OpenVINO, TVM. - AI Hardware Acceleration & Deployment - Optimize AI workloads for Qualcomm Hexagon DSP, GPUs (CUDA, Tensor Cores), TPUs, NPUs, FPGAs, Habana Gaudi, Apple Neural Engine. - Leverage Python APIs for hardware-specific acceleration, including cuDNN, XLA, MLIR. - Benchmark models on AI hardware architectures and debug performance issues. - AI Research & Innovation - Conduct state-of-the-art research on AI inference efficiency, model compression, low-bit precision, sparse computing, and algorithmic acceleration. - Explore new deep learning architectures (Sparse Transformers, Mixture of Experts, Flash Attention) for better inference performance. - Contribute to open-source AI projects and publish findings in top-tier ML conferences (NeurIPS, ICML, CVPR). - Collaborate with hardware vendors and AI research teams to optimize deep learning models for next-gen AI accelerators. Qualification Required: - Bachelor's degree in Engineering, Information Systems, Computer Science, or related field with 8+ years of Systems Engineering or related work experience. OR - Master's degree in Engineering, Information Systems, Computer Science, or related field with 7+ years of Systems Engineering or related work experience. OR - PhD in Engineering, Information Systems, Computer Science, or related field with 6+ years of Systems Engineering or related work experience. Details of Expertise: - Experience optimizing LLMs, LVMs, LMMs for inference. - Experience with deep learning frameworks: TensorFlow, PyTorch, JAX, ONNX. - Advanced skills in model quantization, pruning, and compression. - Proficiency in CUDA programming and Python GPU acceleration using cuPy, Numba, and TensorRT. - Hands-on experience with ML inference runtimes (TensorRT, TVM, ONNX Runtime, OpenVINO). - Experience working with RunTimes Delegates (TFLite, ONNX, Qualcomm). - Strong expertise in Python programming, writing optimized and scalable AI code. - Experience with debugging AI models, including examining computation graphs using Netron Viewer, TensorBoard, and ONNX Runtime Debugger. - Strong debugging skills using profiling tools (PyTorch Profiler, TensorFlow Profiler, cProfile, Nsight Systems, perf, Py-Spy). - Expertise in cloud-based AI inference (AWS Inferentia, Azure ML, GCP AI Platform, Habana Gaudi). - Knowledge of hardware-aware optimizations (oneDNN, XLA, cuDNN, ROCm, MLIR). - Contributions to the open-source community. - Publications in International forums/conferences/journals.,