Sicheng Mo

@sicheng_mo

Graduate student at UCLA. Interested in ML and CV.

Joined June 2022

141Following

112Followers

Pinned

Sicheng Mo@sicheng_mo · Jun 26

#ICCV2025 Introducing X-Fusion: Introducing New Modality to Frozen Large Language Models It is a novel framework that adapts pretrained LLMs (e.g., LLaMA) to new modalities (e.g., vision) while retaining their language capabilities and world knowledge! （1/n） Project Page:…

6.0K

Sicheng Mo Retweeted

Xun Huang@xunhuang1995 · Jul 11

What exactly is a "world model"? And what limits existing video generation models from being true world models? In my new blog post, I argue that a true video world model must be causal, interactive, persistent, real-time, and physical accurate. xunhuang.me/blogs/world_mo…

254

154

117.0K

Sicheng Mo Retweeted

Min-Hung (Steve) Chen@CMHungSteven · Jun 10

@CVPR is around the corner!! Join us at the Workshop on T4V at #CVPR2025 with a great speaker lineup (@MikeShou1, @jw2yang4ai, @WenhuChen, @roeiherzig, Yuheng Li, Kristen Grauman) covering diverse topics! Website: sites.google.com/view/t4v-cvpr2… #CVPR #Transformer #Vision #T4V2025 #T4V

8.0K

Sicheng Mo Retweeted

Wayne Wu@wayne_wu_0503 · Jun 10

Join our #CVPR2025 Workshop on Real2Sim: Bridging the Gap between Neural Rendering and Robot Learning on 6/12! With amazing speakers: @drmapavone @shahdhruv_ @GordonWetzstein @LingjieLiu1 @sicheng_mo @RuohanZhang76 @carlo_sferrazza ⏲️ Thu, 6/12, 1:45-5:30 PM CDT 🏢 Davidson…

21.0K

Sicheng Mo Retweeted

Grace Luo@graceluo_ · Jun 6

✨New preprint: Dual-Process Image Generation! We distill *feedback from a VLM* into *feed-forward image generation*, at inference time. The result is flexible control: parameterize tasks as multimodal inputs, visually inspect the images with the VLM, and update the generator.🧵

176

1.0K

941

128.0K

Sicheng Mo Retweeted

AK@_akhaliq · Apr 30

YoChameleon Personalized Vision and Language Generation

135

13.0K

Sicheng Mo@sicheng_mo · Feb 11

Oh great, it is accepted by #ICLR2025 as Spotlight paper!

BBolei Zhou@zhoubolei · Feb 4

We release a new urban simulator, MetaUrban, to support research on AI agents for micromobility. The work will be presented at #ICLR2025, and the demo code can run on any laptop. Webpage: metadriverse.github.io/metaurban/ Code: github.com/metadriverse/m… Paper: arxiv.org/pdf/2407.08725

10.0K

Sicheng Mo Retweeted

Jiao Sun@sunjiao123sun_ · Dec 14

Mitigating racial bias from LLMs is a lot easier than removing it from humans! Can’t believe this happened at the best AI conference @NeurIPSConf We have ethical reviews for authors, but missed it for invited speakers? 😡

182

803

4.0K

526

2.2M

Sicheng Mo@sicheng_mo · Dec 11

Come by our poster tomorrow :D

JJordan Lin@kuanhenglin · Dec 11

Stop by our Ctrl-X poster this week at #NeurIPS2024 :D Wednesday Nov. 11th 4:30–7:30PM, East Exhibit Hall A–C #1605 Come say hi!

253

Sicheng Mo Retweeted

Rundi Wu@ChrisWu6080 · Nov 28

🚀 Introducing CAT4D! 🚀 CAT4D transforms any real or generated video into dynamic 3D scenes with a multi-view video diffusion model. The outputs are dynamic 3D models that we can freeze and look at from novel viewpoints, in real-time! Be sure to try our interactive viewer!

394

137

113.0K

Sicheng Mo Retweeted

Yuchen Cui@YuchenCui1 · Nov 2

🚀 I am recruiting PhD students for Fall 2025 at the UCLA Robot Intelligence Lab! 🤖 If you are interested in robot learning and human-robot interaction, mark me as a potential adivisor when you apply to the UCLA CS PhD program! #PhD #Robotics @CS_UCLA

111

745

306

65.0K

Sicheng Mo Retweeted

Jordan Lin@kuanhenglin · Oct 1

Ctrl-X was accepted to #NeurIPS2024! We present a guidance-free structure and appearance control method for any pre-trained diffusion model. Paper, code, and results: genforce.github.io/ctrl-x It was awesome collaborating with @sicheng_mo @BenKlingher Fangzhou Mu @zhoubolei :D

5.0K

Sicheng Mo Retweeted

Jiawei Yang@JiaweiYang118 · Jul 24, 2024

Very excited to get this out: “DVT: Denoising Vision Transformers”. We've identified and combated those annoying positional patterns in many ViTs. Our approach denoises them, achieving SOTA results and stunning visualizations! Learn more on our website: jiawei-yang.github.io/DenoisingViT/

405

213

45.0K

Sicheng Mo Retweeted

Andrew Owens@andrewhowens · Jun 21, 2024

In case you were wondering what’s going on with the back of the #CVPR2024 T-shirt: it’s a hybrid image made by @invernopark and @dangengdg! When you look at it up close, you’ll just see the Seattle skyline, but when you view it from a distance, the text “CVPR” should appear.

439

38.0K

Sicheng Mo Retweeted

Nicholas Roberts@nick11roberts · Jun 5, 2024

So many new LLM architectures (Mambas🐍, Transformers🤖,🦙,🦔, Hyenas🐺,🦓…), so little GPU time to combine them into hybrid LLMs… Good news! Today we release Manticore, a system for creating **pretrained hybrids** from pretrained models! 👨‍🌾🦁🦂 arxiv.org/pdf/2406.00894 1/n

178

116

43.0K