Yunqi (Richard) Gu
@richard_yunqigu
AI @stanfordailab
We are presenting 17:00-19:00 today at Poster 267 in ExHall D for #CVPR25! Come and check out the first #VLM #3D #Graphics Benchmark! 📣📣📣
Which multimodal LLM should you be using to edit graphics in Blender? Today, we’re releasing our #CVPR2025 Highlight🌟 work, #BlenderGym 🏋️♀️, the first agentic 3D graphics editing benchmark that will tell you exactly how multimodal LLMs compare in their Blender-editing skills.…
Was super fun exploring this! Most modern policies don't use history -- Diffusion Policy in particular gets a lot worse. We identify a simple ingredient for history improvement, and use it to improve efficiency and performance of long-context policies.
Giving history to our robot policies is crucial to solve a variety of daily tasks. However, diffusion policies get worse when adding history. 🤖 In our recent work we learn how adding an auxiliary loss that we name Past-Token Prediction (PTP) together with cached embeddings…
Kernels are the kernel of deep learning. 🙃...but writing kernels sucks. Can LLMs help? 🤔 Introducing 🌽 KernelBench (Preview), a new coding benchmark designed to evaluate the ability of LLMs to generate ⚡️efficient💨 GPU kernels for optimizing neural network performance.
Today, we're releasing a new paper – One-Minute Video Generation with Test-Time Training. We add TTT layers to a pre-trained Transformer and fine-tune it to generate one-minute Tom and Jerry cartoons with strong temporal consistency. Every video below is produced directly by…
Excited to share our work: Gaussian Mixture Flow Matching Models (GMFlow) github.com/lakonik/gmflow GMFlow generalizes diffusion models by predicting Gaussian mixture denoising distributions, enabling precise few-step sampling and high-quality generation.