Yang Zheng

@yang_zheng18

PhD student @Stanford

Joined January 2020

161Following

177Followers

Pinned

Yang Zheng@yang_zheng18 · Mar 25

Can we reconstruct relightable human hair appearance from real-world visual observations? We introduce GroomLight, a hybrid inverse rendering method for relightable human hair appearance modeling. syntec-research.github.io/GroomLight/

143

18.0K

Yang Zheng Retweeted

Adam W. Harley@AdamWHarley · Jun 25

AllTracker: Efficient Dense Point Tracking at High Resolution If you're using any point tracker in any project, this is likely a drop-in upgrade—improving speed, accuracy, and density, all at once.

238

129

28.0K

Yang Zheng Retweeted

Ziyi Wu@Dazitu_616 · Jun 5

📢 Introducing DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models Compared to vanilla DPO, we improve paired data construction and preference label granularity, leading to better visual quality and motion strength with only 1/3 of the data. 🧵

171

118

23.0K

Yang Zheng Retweeted

Gordon Wetzstein@GordonWetzstein · Jun 3

Most video models 🤯forget the past 🐌slow down over time 🔁rely on bidirectional (not causal) attention Our state-space video world models (SSM) 🧠remember across hundreds of frames ⚡️generate at constant speed ⏩is fully causal, enabling real-time rollout 1/3

327

181

19.0K

Yang Zheng@yang_zheng18 · May 22

More cool videos🔥 and details available on our website: dsaurus.github.io/isa4d/

GGordon Wetzstein@GordonWetzstein · May 22

Video generation of humans with control over body pose and facial expressions is crucial for a plethora of applications. Towards this goal, we introduce a new interspatial attention (ISA) mechanism as a scalable building block for DiT–based video generation models #SIGGRAPH2025

372

Yang Zheng Retweeted

Gordon Wetzstein@GordonWetzstein · May 22

200

27.0K

Yang Zheng Retweeted

Boyang Deng@boyang_deng · Apr 14

Curious about how cities have changed in the past decade? We use MLLMs to analyse 40 million Street View images to answer this. Do you know that "juice shops became a thing in NYC" and "miles of overpasses were painted BLUE in SF"? More at→boyangdeng.com/visual-chronic… (vid ↓ w/ 🔊)

19.0K

Yang Zheng Retweeted

Hansheng Chen@HanshengCh · Apr 8

Excited to share our work: Gaussian Mixture Flow Matching Models (GMFlow) github.com/lakonik/gmflow GMFlow generalizes diffusion models by predicting Gaussian mixture denoising distributions, enabling precise few-step sampling and high-quality generation.

124

11.0K

Yang Zheng Retweeted

Qingqing Zhao@qingqing_zhao_ · Mar 31

Introduce CoT-VLA – Visual Chain-of-Thought reasoning for Robot Foundation Models! 🤖 By leveraging next-frame prediction as visual chain-of-thought reasoning, CoT-VLA uses future prediction to guide action generation and unlock large-scale video data for training. #CVPR2025

296

168

39.0K

Yang Zheng Retweeted

Hong-Xing "Koven" Yu@Koven_Yu · Mar 28

🔥Want to capture 3D dancing fluids♨️🌫️🌪️💦? No specialized equipment, just one video! Introducing FluidNexus: Now you only need one camera to reconstruct 3D fluid dynamics and predict future evolution! 🧵1/4 Web: yuegao.me/FluidNexus/ Arxiv: arxiv.org/pdf/2503.04720

110

23.0K

Yang Zheng Retweeted

Ian Huang@IanHuang3D · Mar 26

🏡Building realistic 3D scenes just got smarter! Introducing our #CVPR2025 work, 🔥FirePlace, a framework that enables Multimodal LLMs to automatically generate realistic and geometrically valid placements for objects into complex 3D scenes. How does it work?🧵👇

377

172

114.0K

Yang Zheng Retweeted

Xiaomeng Xu@XiaomengXu11 · Jan 10

Can robots leverage their entire body to sense and interact with their environment, rather than just relying on a centralized camera and end-effector? Introducing RoboPanoptes, a robot system that achieves whole-body dexterity through whole-body vision. robopanoptes.github.io

330

128

73.0K

Yang Zheng Retweeted

Jiaman Li@jiaman01 · Dec 20

🤖 Introducing Human-Object Interaction from Human-Level Instructions! First complete system that generates physically plausible, long-horizon human-object interactions with finger motions in contextual environments, driven by human-level instructions. 🔍 Our approach: - LLMs…

112

517

294

102.0K

Yang Zheng Retweeted

Nakayama George@GeorgeNaka40190 · Dec 9

Do large multimodal models understand how to make dresses for your winter holiday party💃? We introduce AIpparel, a vision-language-garment model capable of generating and editing simulation-ready sewing patterns from text and images. Project page at georgenakayama.github.io/AIpparel/.…

12.0K