Zeren Jiang
@CodyJzr
PhD student @ Oxford VGG
🎁 We present Geo4D, a method that repurposes a video diffusion model for monocular 4D reconstruction. Project page: geo4d.github.io Code repo: github.com/jzr99/Geo4D 𝐌𝐚𝐢𝐧 𝐂𝐨𝐧𝐭𝐫𝐢𝐛𝐮𝐭𝐢𝐨𝐧𝐬: ✨ A novel framework, Geo4D, to reconstruct the dynamic scene,…
Play 4D scenes part 2. With the same monocular video input, Geo4D (github.com/jzr99/Geo4D) can now provide a more robust and clear 4D reconstruction result. CRAZYYY. I cannot imagine what is next. 4DGS from monocular video? I think it's feasible already.
Preprint of today: Jiang et al., "Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction" -- geo4d.github.io Fine-tune a video model to estimate ray, point maps, depth, then aggregate estimates of sliding windows through a multi-modal alignment.
Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction @CodyJzr, @ChuanxiaZ, Iro Laina, @dlarlus, Andrea Vedaldi tl;dr: point+disparity+ray maps->pre-trained video diffusion model->CLIP->query transformer->U-Net->VAE decoder->alignment arxiv.org/abs/2504.07961