Jiatao Gu
@thoma_gu
Assistant Prof @CIS_Penn and ML Researcher at @Apple (MLR) | exFAIRer | PhD @HKUniversity | Research on Generative AI for multimodal. また日本語もできます。
I will be attending #ICML2025 Tue to Sat at Vancouver. Please also come and check our oral presentation and spotlight poster on TARFlow on Thu: icml.cc/virtual/2025/p… Looking forward to chatting with old and new friends on next-gen generative models and world models!!
Thanks @9to5mac for summarizing our research on TARFlow/STARFlow! It is an exciting direction of reviving normalizing flow with modern scalable techniques… and more will come!
Apple Research just unearthed a forgotten AI technique and is using it to generate images 9to5mac.com/2025/06/23/app… by @mvcmendes
I like our Vid2Sim for two main reasons: 1. The inverse physics problem can be efficiently tackled through a generalized feed-forward prediction of physical properties + a lightweight optimization accelerated by the proposed Neural Jacobian. 2. Its handle-based 3D representation…
Check out 🌟Vid2Sim: Generalizable, Video-based Reconstruction of Appearance, Geometry & Physics for Mesh-Free Simulation #CVPR2025, from @LingjieLiu1’s lab at UPenn. Congrats to @MorPhLingXD! Vid2Sim aims to achieve system identification by reconstructing geometry, appearance,…
World Simulator, reimagined — now alive with humans, robots, and their vibrant society unfolding in 3D real-world geospatial scenes across the globe! 🚀 One day soon, humans and robots will co-exist in the same world. To prepare, we must address: 1️⃣ How can robots cooperate or…
Come visit our posters today and chat with us! 🕥 10:30–12:30 – Poster #153 🔹 Ego4D: Egocentric Human Motion Capture & Understanding from Multi-Modal Input 🔗 jianwang-mpi.github.io/ego4o/ 🕓 16:00–18:00 – Poster #37 🔹 Vid2Sim: Generalizable, Video-based Reconstruction of…
Please drop by and check our highlight poster tomorrow at #CVPR2025! ExHall D Poster #60 Sun 15 Jun 10:30 a.m. CDT — 12:30 p.m. CDT Great work by our @Apple intern @QihangZhang0224 and look forward to more exploration on explicit 3D generation! zqh0253.github.io/wvd/
Excited to share our paper "World-consistent Video Diffusion (WVD)" has been accepted at #CVPR2025! arxiv.org/abs/2412.01821 Huge congrats to our amazing intern @QihangZhang0224 and colleagues @zhaisf @itsbautistam @KJHMiao @alexttoshev & Josh Susskind!
Congrats @RickyTQChen to the nice work! This reminds me of our earlier work Levenshtein Transformers (x.com/thoma_gu/statu…) at FAIR! We learned non-autoregressive insertion-deletion network for machine translation. Good memories before the LLM era!
Padding in our non-AR sequence models? Yuck. 🙅 👉 Instead of unmasking, our new work *Edit Flows* perform iterative refinements via position-relative inserts and deletes, operations naturally suited for variable-length sequence generation. Easily better than using mask tokens.
Apple presents STARFlow Scaling Latent Normalizing Flows for High-resolution Image Synthesis
Feel free to drop by our talks at: June 11 Morning (202 B): vision-x-nyu.github.io/scalable-visio… June 11 Afternoon (Grand A2): generative-vision.github.io/workshop-CVPR-… June 12 Afternoon (103 A): vgm-cvpr.github.io
I will be attending #CVPR2025 and presenting our latest research at Apple MLR! Specifically, I will present our highlight poster--world consistent video diffusion (cvpr.thecvf.com/virtual/2025/p…), and three workshop invited talks which includes our recent preprint ★STARFlow★! (0/n)