Job
Description
As a member of the AI Research team, you will drive the next wave of generative and multimodal intelligence to empower our autonomous drywall-finishing robots. Your main focus will involve turning cutting-edge vision-language and diffusion advances into robust, real-time systems that have the capability to see, reason, and act on dynamic construction sites. Your responsibilities will include researching and innovating diffusion-based generative models for photorealistic wall-surface simulation, defect synthesis, and domain adaptation. You will be tasked with architecting and training Vision-Language Models (VLMs) and Vision-Language Alignment (VLA) objectives that establish connections between textual work orders, CAD plans, and sensor data to achieve pixel-level understanding. Additionally, you will lead the development of auto-annotation pipelines that can scale to millions of frames and point-clouds with minimal human effort, utilizing techniques such as active learning, self-training, and synthetic data generation. Furthermore, you will be responsible for optimizing and compressing models for deployment on Jetson-class edge devices under ROS 2. The full lifecycle of the project, from problem definition to production hand-off to perception and controls teams, will be in your hands. You will also be expected to publish internal tech reports and external conference papers, as well as mentor interns and junior engineers. To qualify for this role, you should have at least 3 years of experience in deep-learning R\&D or a Ph.D./M.S. in CS, EE, Robotics, or a related field with a strong publication record. Demonstrated expertise in diffusion models and multimodal transformers/VLMs is essential, along with a proven track record of building large-scale data-centric AI workflows. Proficiency in Python, PyTorch (or JAX), experiment tracking, and scalable training is required, as well as familiarity with edge-AI runtimes and CUDA/C++ performance tuning. Joining our team will offer you the opportunity to own breakthrough technology from conception to deployment on active job-sites. You will collaborate cross-functionally with perception, controls, and product teams, and have the chance to shape an industry by introducing intelligent robots to replace dangerous and repetitive construction labor. We offer a competitive salary, equity, hardware budget, flexible hybrid work arrangements, and a culture that values deep work and rapid iteration. If you possess skills in Vision-Language Models, Vision-Language Alignment, R\&D, hold a Ph.D./M.S. in CS, EE, Robotics, and have experience with TensorRT, ONNX Runtime, and CUDA/C++, then this role might be the perfect fit for you.,