Chen Geng

@gengchen01

CS Ph.D. Student @Stanford. Previously Hons. B.Eng. in CS @ZJU_China.

Joined December 2013

793Following

856Followers

Pinned

Chen Geng@gengchen01 · Dec 17

Ever wondered how roses grow and wither in your backyard?🌹 Our latest work on generating 4D temporal object intrinsics lets you explore a rose's entire lifecycle—from birth to death—under any environment light, from any viewpoint, at any moment. Project page:…

188

33.0K

Chen Geng@gengchen01 · Jul 16

📷 New Preprint: SOTA optical flow extraction from pre-trained generative video models! While it seems intuitive that video models grasp optical flow, extracting that understanding has proven surprisingly elusive.

SSeungwoo (Simon) Kim@SeKim1112 · Jul 15

We prompt a generative video model to extract state-of-the-art optical flow, using zero labels and no fine-tuning. Our method, KL-tracing, achieves SOTA results on TAP-Vid & generalizes to challenging YouTube clips. @khai_loong_aw @KlemenKotar @CristbalEyzagu2 @lee_wanhee_…

12.0K

Chen Geng Retweeted

Yuxi Xiao@YuxiXiaohenry · Jul 8

🚀 We release SpatialTrackerV2: the first feedforward model for dynamic 3D reconstruction and 3D point tracking — all at once! Reconstruct dynamic scenes and predict pixel-wise 3D motion in seconds. 🔗 Webpage: spatialtracker.github.io 🔍 Online Demo: huggingface.co/spaces/Yuxihen…

445

300

34.0K

Chen Geng@gengchen01 · Jun 26

In our #ICCV2025 WonderPlay, we study how to combine physical simulation and video generative prior to enable 3D action interaction with the world from a single image! Check the 🧵for more details!

HHong-Xing "Koven" Yu@Koven_Yu · Jun 26

#ICCV2025 🤩3D world generation is cool, but it is cooler to play with the worlds using 3D actions 👆💨, and see what happens! — Introducing *WonderPlay*: Now you can create dynamic 3D scenes that respond to your 3D actions from a single image! Web: kyleleey.github.io/WonderPlay/ 🧵1/7

8.0K

Chen Geng Retweeted

Sanjana Srivastava@sanjana__z · Jun 25

🤖 Household robots are becoming physically viable. But interacting with people in the home requires handling unseen, unconstrained, dynamic preferences, not just a complex physical domain. We introduce ROSETTA: a method to generate reward for such preferences cheaply. 🧵⬇️

131

29.0K

Chen Geng@gengchen01 · Jun 19

📢 Call for Papers - We are organizing @ICCVConference Workshop on Generating Digital Twins from Images and Videos (gDT-IV) at #ICCV2025! We welcome submissions in two tracks: 📅 Deadline for Archival Paper Track: June 27 ⏰ Deadline for Non-Archival Paper Track: July 31 🌐…

gengchen01's tweet image. 📢 Call for Papers - We are organizing @ICCVConference Workshop on Generating Digital Twins from Images and Videos (gDT-IV) at #ICCV2025! We welcome submissions in two tracks:
📅 Deadline for Archival Paper Track: June 27
⏰ Deadline for Non-Archival Paper Track: July 31
🌐…

5.0K

Chen Geng Retweeted

Yunzhi Zhang@zhang_yunzhi · Jun 14

(1/n) Time to unify your favorite visual generative models, VLMs, and simulators for controllable visual generation—Introducing a Product of Experts (PoE) framework for inference-time knowledge composition from heterogeneous models.

301

218

44.0K

Chen Geng@gengchen01 · Jun 4

No labels, no priors -- just learning from raw data. Our latest work learns unified 4D motion representations for dynamic objects in a fully self-supervised way. Check out this work led by our awesome intern @AlexHe00880585! 🚀

GGuangzhao (Alex) He@AlexHe00880585 · Jun 4

💫 Animating 4D objects is complex: traditional methods rely on handcrafted, category-specific rigging representations. 💡 What if we could learn unified, category-agnostic, and scalable 4D motion representations — from raw, unlabeled data? 🚀 Introducing CANOR at #CVPR2025: a…

2.0K

Chen Geng Retweeted

Yuhao Zhang@yzhanglp · May 30

🪄Introducing Anymate—a large-scale dataset of 230K 3D assets with rigging and skinning annotations! With this dataset, we trained an auto-rigging model and benchmarked a variety of architectures! 🔥Turn static assets into animatable ones in seconds: huggingface.co/spaces/yfdeng/…

188

153

9.0K

Chen Geng Retweeted

Wenlong Huang@wenlong_huang · May 16

How to scale visual affordance learning that is fine-grained, task-conditioned, works in-the-wild, in dynamic envs? Introducing Unsupervised Affordance Distillation (UAD): distills affordances from off-the-shelf foundation models, *all without manual labels*. Very excited this…

106

434

239

88.0K

Chen Geng Retweeted

Yanjie Ze@ZeYanjie · May 6

🤖Introducing TWIST: Teleoperated Whole-Body Imitation System. We develop a humanoid teleoperation system to enable coordinated, versatile, whole-body movements, using a single neural network. This is our first step toward general-purpose robots. 🌐humanoid-teleop.github.io

412

178

59.0K

Chen Geng Retweeted

Hong-Xing "Koven" Yu@Koven_Yu · Apr 7

🔥Spatial intelligence requires world generation, and now we have the first comprehensive evaluation benchmark📏 for it! Introducing WorldScore: Unifying evaluation for 3D, 4D, and video models on world generation! 🧵1/7 Web: haoyi-duan.github.io/WorldScore/ arxiv: arxiv.org/abs/2504.00983

246

102

63.0K

Chen Geng@gengchen01 · Apr 3

One day left before the submissions close!

EElliott / Shangzhe Wu@elliottszwu · Mar 25

Submission deadline has been extended by a week to April 4. Submit your latest 4D work to the workshop @CVPR: 4D Gaussians, point tracking, dynamic SLAMs, egocentric, human motion, multi-modal world models, embodied AI... you name it! 4dvisionworkshop.github.io

5.0K

Chen Geng Retweeted

Michelle Guo@mshlguo · Mar 31

🎉 Our paper "PGC: Physics-Based Gaussian Cloth from a Single Pose" has been accepted to #CVPR2025! 👕 PGC uses a PBR + 3DGS representation to render simulation-ready garments under novel lighting and motion, all from a single static frame. ✨Web: phys-gaussian-cloth.github.io 🧵1/4

107

10.0K

Chen Geng Retweeted

Hong-Xing "Koven" Yu@Koven_Yu · Mar 28

🔥Want to capture 3D dancing fluids♨️🌫️🌪️💦? No specialized equipment, just one video! Introducing FluidNexus: Now you only need one camera to reconstruct 3D fluid dynamics and predict future evolution! 🧵1/4 Web: yuegao.me/FluidNexus/ Arxiv: arxiv.org/pdf/2503.04720

110

23.0K

Chen Geng@gengchen01 · Mar 27

Extracting structure that’s implicitly learned by video foundation models _without_ relying on labeled data is a fundamental challenge. What’s a better place to start than extracting motion? Temporal correspondence is a key building block of perception. Check out our paper!

DDaniel Yamins@dyamins · Mar 27

New paper on self-supervised optical flow and occlusion estimation from video foundation models. @sstj389 @jiajunwu_cs @SeKim1112 @Rahul_Venkatesh tinyurl.com/dpa3auzd @

4.0K

Chen Geng Retweeted

Yang Zheng@yang_zheng18 · Mar 25

Can we reconstruct relightable human hair appearance from real-world visual observations? We introduce GroomLight, a hybrid inverse rendering method for relightable human hair appearance modeling. syntec-research.github.io/GroomLight/

144

18.0K

Chen Geng Retweeted

Fan-Yun Sun@sunfanyun · Mar 18

Spatial reasoning is a major challenge for the foundation models today, even in simple tasks like arranging objects in 3D space. #CVPR2025 Introducing LayoutVLM, a differentiable optimization framework that uses VLM to spatially reason about diverse scene layouts from unlabeled…

239

150

87.0K

Chen Geng Retweeted

Kyle Sargent@KyleSargentAI · Mar 18

Modern generative models of images and videos rely on tokenizers. Can we build a state-of-the-art discrete image tokenizer with a diffusion autoencoder? Yes! I’m excited to share FlowMo, with @kylehkhsu, @jcjohnss, @drfeifei, @jiajunwu_cs. A thread 🧵:

138

598

394

130.0K