Rundi Wu
@ChrisWu6080
Research Scientist @GoogleDeepMind | prev CS PhD @Columbia
🚀 Introducing CAT4D! 🚀 CAT4D transforms any real or generated video into dynamic 3D scenes with a multi-view video diffusion model. The outputs are dynamic 3D models that we can freeze and look at from novel viewpoints, in real-time! Be sure to try our interactive viewer!
I’ll be presenting CAT4D this Sunday at 1:45pm and our poster session will start afterwards at 4pm. Feel free to come and say hi! cvpr.thecvf.com/virtual/2025/o…
🚀 Introducing CAT4D! 🚀 CAT4D transforms any real or generated video into dynamic 3D scenes with a multi-view video diffusion model. The outputs are dynamic 3D models that we can freeze and look at from novel viewpoints, in real-time! Be sure to try our interactive viewer!
Bored of linear recurrent memories (e.g., linear attention) and want a scalable, nonlinear alternative? Our new paper “Test-Time Training Done Right” propose LaCT (Large Chunk Test-Time Training) — a highly efficient, massively scalable nonlinear memory with: 💡 Pure PyTorch…
⚡️ Introducing Bolt3D ⚡️ Bolt3D generates interactive 3D scenes in less than 7 seconds on a single GPU from one or more images. It features a latent diffusion model that *directly* generates 3D Gaussians of seen and unseen regions, without any test time optimization. 🧵👇 (1/9)
Super cool results! Congrats!
Very excited to share Stable Virtual Camera, a generalist diffusion model for view synthesis: stable-virtual-camera.github.io It scales well with data, and works out-the-box for different NVS tasks. Code and 🤗 demo are released! 🧵(1/N)
Very impressive large-scale 3D scene generation results!!!
Big @odysseyml news! We’re unveiling Explorer, a generative world model. Explorer transforms text or an image into a realized, detailed 3D world. I'm also incredibly excited that @edcatmull, the legendary Pixar founder, is joining our board & investing. Read on for more...
Introducing 👀Stereo4D👀 A method for mining 4D from internet stereo videos. It enables large-scale, high-quality, dynamic, *metric* 3D reconstructions, with camera poses and long-term 3D motion trajectories. We used Stereo4D to make a dataset of over 100k real-world 4D scenes.
How to perform robust 3D reconstruction in the presence of various inconsistencies during capture (e.g., dynamic or lighting changes)? Checkout @alextrevith 's SimVS --- simulating the world inconsistencies using video generation models for robust view synthesis!
🚀 Introducing SimVS: our new method that simplifies 3D capture! 🎯 3D reconstruction assumes consistency—no dynamics or lighting changes—but reality constantly breaks this assumption. ✨ SimVS takes a set of inconsistent images and makes them consistent with a chosen frame.
Introducing MegaSaM! 🎥 Accurate, fast, & robust structure + camera estimation from casual monocular videos of dynamic scenes! MegaSaM outputs camera parameters and consistent video depth, scaling to long videos with unconstrained camera paths and complex scene dynamics!
A common question nowadays: Which is better, diffusion or flow matching? 🤔 Our answer: They’re two sides of the same coin. We wrote a blog post to show how diffusion models and Gaussian flow matching are equivalent. That’s great: It means you can use them interchangeably.
Woohoo, big congrats to the World Labs team! Tech looks similar to CAT3D (cat3d.github.io): multi-view diffusion model + 3DGS, maybe w/360 data + depth priors. To bring these worlds to life with dynamics, check out our new work on CAT4D: cat-4d.github.io 😺
We’ve been busy building an AI system to generate 3D worlds from a single image. Check out some early results on our site, where you can interact with our scenes directly in the browser! worldlabs.ai/blog 1/n
Stop watching videos, start interacting with worlds. Stoked to share CAT4D, our new method for turning videos into dynamic 3D scenes that you can move through in real-time!
Check out our new paper that turns (text, sparse images, videos) => (dynamic 3D scenes)! I can't get over how cool the interactive demo is. Try it out for yourself on the project page: cat-4d.github.io
🚀 Introducing CAT4D! 🚀 CAT4D transforms any real or generated video into dynamic 3D scenes with a multi-view video diffusion model. The outputs are dynamic 3D models that we can freeze and look at from novel viewpoints, in real-time! Be sure to try our interactive viewer!
At #ECCV2024, we presented Minimalist Vision with Freeform Pixels, a new vision paradigm that uses a small number of freeform pixels to solve lightweight vision tasks. We are honored to have received the Best Paper Award! Check out the project here: cave.cs.columbia.edu/projects/categ…
I'm at #CVPR2024 Seattle this week. Happy to chat about anything! Please come and visit our ReconFusion poster on Friday 21 Jun 10:30 a.m, Arch 4A-E Poster #193. reconfusion.github.io
Check out our Generative Camera Dolly🎥 tl;dr: fine-tuning a video model for novel view synthesis of dynamic scenes.
Excited to share our new paper on large-angle monocular dynamic novel view synthesis! Given a single RGB video, we propose a method that can imagine what that scene would look like from any other viewpoint. Website: gcd.cs.columbia.edu Paper: arxiv.org/abs/2405.14868 🧵(1/5)
I’ll be at #ICLR2024 Vienna during May 7-11! Come and check out our paper! Happy to chat about anything! Sin3DM: Learning a Diffusion Model from a Single 3D Textured Shape. Poster on May 9 at 10:45am.
Given a single 3D asset, can we generate its variations without relying on prior knowledge? Introducing Sin3DM ✨, a diffusion model that learns from a single 3D asset and generates high-quality variations with fine geometry and texture details. sin3dm.github.io [1/4]
3D Gaussian is great, but how can you interact with it 🌹👋? Introducing #PhysDreamer: Create your own realistic interactive 3D assets from only static images! Discover how we do this below👇 🧵1/: Website: physdreamer.github.io