We are building a next-generation AI Compiler and Runtime Stack targeting a QEMU-emulated RISC-V-based NPU architecture. 
  You will work on cutting-edge open-source technologies like MLIR, IREE, and LLVM, designing advanced compiler flows for scalar, vector (RVV), and matrix multiplication cores powering real-world Automotive, Robotics, and AR/VR AI workloads. 
  This is a rare opportunity to work on an end-to-end AI system from model ingestion (PyTorch, TensorFlow, ONNX) through optimized codegen and runtime deployment. 
  Why Join Us? 
  Build a full-stack AI platform from the ground up. 
  Work on next-generation RISC-V AI accelerators. 
  Opportunity to contribute upstream to MLIR, LLVM, and IREE. 
  Responsibilities: 
  Develop custom MLIR dialects, passes, and transformations for optimizing AI models. 
  Extend the LLVM RISC-V backend to support new instructions. 
  Integrate hand-optimized ukernels with IREE for critical AI operations (matmul, convolution, reductions). 
  Build compiler optimizations for: Loop unrolling, fusion, and vectorization, Memory access optimization (SRAM-aware). 
  Lower AI models from high-level ML frameworks through MLIR to LLVM IR, generating RISC-V assembly. 
  Enhance IREEs codegen and runtime scheduling for scalar, vector, and matrix cores. 
  Collaborate with hardware architects, runtime developers, and QEMU platform engineers. 
  
  Qualifications: 
  Prior work with AI/ML workloads (Vision Transformers, Object Detection, ASR, etc.). 
  Familiarity with ping-pong buffering in SRAM and optimizing for memory bandwidth. 
  Contribution to open-source projects in MLIR, LLVM, or IREE. 
  Understanding of AI model optimization techniques (quantization, tiling, scheduling). 
  Experience with profiling and debugging low-level systems.