Sangwoo Mo

@sangwoomo

Postdoc @UMich. Past: PhD @kaist_ai, Intern @AIatMeta, @NVIDIAAI. Work on scalable priors for vision, language, and robotics.

Ann Arbor, MI

Joined April 2016

987Following

822Followers

Pinned

Sangwoo Mo@sangwoomo · Jun 16

Can scaling data and models alone solve computer vision? 🤔 Join us at the SP4V Workshop at #ICCV2025 in Hawaii to explore this question! 🎤 Speakers: @danfei_xu, @joaocarreira, @jiajunwu_cs, Kristen Grauman, @sainingxie, @vincesitzmann 🔗 sp4v.github.io

sangwoomo's tweet image. Can scaling data and models alone solve computer vision? 🤔
Join us at the SP4V Workshop at #ICCV2025 in Hawaii to explore this question!

🎤 Speakers: @danfei_xu, @joaocarreira, @jiajunwu_cs, Kristen Grauman, @sainingxie, @vincesitzmann

🔗 sp4v.github.io

21.0K

Pinned

Sangwoo Mo@sangwoomo · Jun 3

Beyond excited to share FlowMo! We found that the latent representations by video models implicitly encode motion information, and can guide the model toward coherent motion at inference time Very proud of @ariel__shaulov @itayhzn for this work! Plus, it’s open source! 🥳

IItay Hazan@itayhzn · Jun 3

🧵1/ Text-to-video models generate stunning visuals, but… motion? Not so much. You get extra limbs, objects popping in and out... In our new paper, we present FlowMo -- an inference-time method that reduces temporal artifacts without retraining or architectural changes. 👇

104

19.0K

Sangwoo Mo@sangwoomo · Jun 18

Excited to speak at this workshop! I think this topic is actually quite nuanced, and I'm excited to talk about our group's experience in trying to learn geometry and structure from data!

SSangwoo Mo@sangwoomo · Jun 16

6.0K

Sangwoo Mo Retweeted

Phillip Isola@phillip_isola · Jun 14

Our computer vision textbook is now available for free online here: visionbook.mit.edu We are working on adding some interactive components like search and (beta) integration with LLMs. Hope this is useful and feel free to submit Github issues to help us improve the text!

625

3.0K

175.0K

Sangwoo Mo Retweeted

Seohong Park@seohong_park · Jun 13

Q-learning is not yet scalable seohong.me/blog/q-learnin… I wrote a blog post about my thoughts on scalable RL algorithms. To be clear, I'm still highly optimistic about off-policy RL and Q-learning! I just think we haven't found the right solution yet (the post discusses why).

191

1.0K

160.0K

Sangwoo Mo Retweeted

CVPR 2025 PixFoundation@PixFoundationCV · Jun 14

Thank you to everyone who participated in our workshop!

484

Sangwoo Mo Retweeted

CVPR 2025 PixFoundation@PixFoundationCV · Jun 10

Are Vision Foundation Models ready to tackle pixel-level tasks? 🖼️ Join us at the Pixel-level Vision Foundation Models (PixFoundation) Workshop at #CVPR2025! We’re excited to introduce an outstanding lineup of invited speakers. Meet them below 👇

6.0K

Sangwoo Mo Retweeted

Kangwook Lee@Kangwook_Lee · Jun 5

As a video gaming company, @Krafton_AI has secretly been cooking something big with @NVIDIAAI for a while! 🥳 We introduce Orak, the first comprehensive video gaming benchmark for LLMs! arxiv.org/abs/2506.03610

145

9.0K

Sangwoo Mo Retweeted

Younggyo Seo@younggyoseo · May 29

Excited to present FastTD3: a simple, fast, and capable off-policy RL algorithm for humanoid control -- with an open-source code to run your own humanoid RL experiments in no time! Thread below 🧵

114

551

294

120.0K

Sangwoo Mo Retweeted

Arthur Allshire@arthurallshire · May 7

our new system trains humanoid robots using data from cell phone videos, enabling skills such as climbing stairs and sitting on chairs in a single policy (w/ @redstone_hong @junyi42 @davidrmcall)

112

626

198

118.0K

Sangwoo Mo Retweeted

Yuke Zhu@yukez · Apr 9

We took a short break from robotics to build a human-level agent to play Competitive Pokémon. Partially observed. Stochastic. Long-horizon. Now mastered with Offline RL + Transformers. Our agent, trained on 475k+ human battles, hits the top 10% on Pokémon Showdown leaderboards.…

361

151

49.0K

Sangwoo Mo Retweeted

David Fan@DavidJFan · Apr 2

Can visual SSL match CLIP on VQA? Yes! We show with controlled experiments that visual SSL can be competitive even on OCR/Chart VQA, as demonstrated by our new Web-SSL model family (1B-7B params) which is trained purely on web images – without any language supervision.

458

303

70.0K

Sangwoo Mo Retweeted

Congyue Deng@CongyueD · Mar 11

In the past, we extended the convolution operator to go from low-level image processing to high-level visual reasoning. Can we also extend physical operators for more high-level physical reasoning? Introducing the Denoising Hamiltonian Network (DHN): arxiv.org/pdf/2503.07596

320

160

40.0K

Sangwoo Mo@sangwoomo · Mar 8

Best wishes for your ICCV submissions, and congrats again on your @CVPR papers! Please share your latest work with the workshop. CVPR dual submissions are allowed, so just reupload them. This is a last-minute call: the deadline is tomorrow! 🔥

CCVPR 2025 PixFoundation@PixFoundationCV · Feb 28

Call for Papers: #CVPR2025 PixFoundation Workshop! Please share your accepted papers at CVPR and submissions at ICCV! 🔥 📅 Deadline (updated): March 9, 2025 sites.google.com/view/pixfounda…

6.0K

Sangwoo Mo@sangwoomo · Feb 11

Excited to share our work on the Diffusion Forcing Transformer—a flexible model that can generate videos from any number of images! We introduce History Guidance to boost quality, consistency, and dynamics, along with capabilities like OOD generalization and long stable rollouts!

BBoyuan Chen@BoyuanChen0 · Feb 11

Announcing Diffusion Forcing Transformer (DFoT), our new video diffusion algorithm that generates ultra-long videos of 800+ frames. DFoT enables History Guidance, a simple add-on to any existing video diffusion models for a quality boost. Website: boyuan.space/history-guidan… (1/7)

6.0K