Ziyi Wu
@Dazitu_616
CS PhD @UofT | Student Researcher @Google | Prev Research Intern @Snap, Undergrad @Tsinghua_Uni
📢 Introducing DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models Compared to vanilla DPO, we improve paired data construction and preference label granularity, leading to better visual quality and motion strength with only 1/3 of the data. 🧵
Thanks NYC 🗽🍎 It was my first time show in New York! Thank you for singing so many songs with me♡ Today was Another Great Day!! #LiSA_NATOUR Next in NYC day2✌︎ See U tomorrow〜🗽♥️
🚀 Introducing UniRelight, a general-purpose relighting framework powered by video diffusion models. 🌟UniRelight jointly models the distribution of scene intrinsics and illumination, enabling high-quality relighting and intrinsic decomposition from a single image or video.
Will be at CVPR this week. Excited to catch up with old friends and connect with new friends!
Thrilled to share the papers that our lab will present at @CVPR. Learn more in this thread 🧵 and meet @Kai__He, @yash2kant, @Dazitu_616, and our previous visitor @toshiya427 in Nashville! 1/n
I'll be presenting our #ICLR2025 poster "SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation" (#189, Thu afternoon), trajectory conditioned i2v generation w/o fine-tuning. Feel free to drop by if you are interested in exploring the zero-shot capabilities of VDMs!
Thrilled to share SG-I2V, a tuning-free method for trajectory-controllable image-to-video (i2v) generation, solely built on the knowledge present in a pre-trained i2v diffusion model ! kmcode1.github.io/Projects/SG-I2… w/ @sherwinbahmani @Dazitu_616 @yash2kant @igilitschenski @DaveLindell
Can we reconstruct relightable human hair appearance from real-world visual observations? We introduce GroomLight, a hybrid inverse rendering method for relightable human hair appearance modeling. syntec-research.github.io/GroomLight/
🚀Excited to introduce GEN3C #CVPR2025, a generative video model with an explicit 3D cache for precise camera control. 🎥It applies to multiple use cases, including single-view and sparse-view NVS🖼️ and challenging settings like monocular dynamic NVS and driving simulation🚗.…
I am excited to share that my students @Kai__He, @yash2kant, @Dazitu_616, and Toshiya Yura, our previous research visitor from Sony, will present papers at #CVPR2025. 🎉 Check out their amazing work! 1/🧵
In the past 1.5 weeks, there appeared 2 papers by 2 different research groups which develop the exactly same (and embarrassingly simple) trick to improve convergence of image/video diffusion models by 20-100+% (sic!) arxiv.org/abs/2502.14831 arxiv.org/abs/2502.09509
📢📢 Last week, we announced Pippo - a DiT that generates 1K res. turnarounds from a single iPhone photo (even occluded ones)! Here’s the deep dive thread unpacking everything we learned! ⬇️
🚀 Introducing Pippo – our diffusion transformer pre-trained on 3B Human Images and post-trained with 400M high-res studio images! ✨Pippo can generate 1K resolution turnaround video from a single iPhone photo! 🧵👀 Full deep dive thread coming up next!
Meta presents: Pippo : High-Resolution Multi-View Humans from a Single Image Generates 1K resolution, multi-view, studio-quality images from a single photo in a one forward pass
Thrilled to announce that SG-I2V has been accepted at #ICLR2025 ! Huge thanks to the collaborators, reviewers, and ACs. Looking forward to presenting this in Singapore!
Thrilled to share SG-I2V, a tuning-free method for trajectory-controllable image-to-video (i2v) generation, solely built on the knowledge present in a pre-trained i2v diffusion model ! kmcode1.github.io/Projects/SG-I2… w/ @sherwinbahmani @Dazitu_616 @yash2kant @igilitschenski @DaveLindell