Xichen Pan (@xichen_pan)

Pinned

X

Xichen Pan@xichen_pan · Apr 11

We find training unified multimodal understanding and generation models is so easy, you do not need to tune MLLMs at all. MLLM's knowledge/reasoning/in-context learning can be transferred from multimodal understanding (text output) to generation (pixel output) even it is FROZEN!

xichen_pan's tweet image. We find training unified multimodal understanding and generation models is so easy, you do not need to tune MLLMs at all.
MLLM's knowledge/reasoning/in-context learning can be transferred from multimodal understanding (text output) to generation (pixel output) even it is FROZEN!

9

67

415

302

68.0K

Xichen Pan Retweeted

S

Saining Xie@sainingxie · Jul 7

Thanks for bringing this to my attention. I honestly wasn’t aware of the situation until the recent posts started going viral. I would never encourage my students to do anything like this—if I were serving as an Area Chair, any paper with this kind of prompt would be…

10

29

214

25

36.0K

X

Xichen Pan@xichen_pan · Jun 27

metaquery is now open-source — with both the data and code available.

XXichen Pan@xichen_pan · Jun 27

The code and instruction-tuning data for MetaQuery are now open-sourced! Code: github.com/facebookresear… Data: huggingface.co/collections/xc… Two months ago, we released MetaQuery, a minimal training recipe for SOTA unified understanding and generation models. We showed that tuning few…

2

7

56

11

9.0K

X

Xichen Pan@xichen_pan · Jun 27

The code and instruction-tuning data for MetaQuery are now open-sourced! Code: github.com/facebookresear… Data: huggingface.co/collections/xc… Two months ago, we released MetaQuery, a minimal training recipe for SOTA unified understanding and generation models. We showed that tuning few…

XXichen Pan@xichen_pan · Apr 11

We find training unified multimodal understanding and generation models is so easy, you do not need to tune MLLMs at all. MLLM's knowledge/reasoning/in-context learning can be transferred from multimodal understanding (text output) to generation (pixel output) even it is FROZEN!

1

22

134

50

18.0K

Xichen Pan Retweeted

E

Ellis Brown@_ellisbrown · Dec 10

Heading to #NeurIPS2024 to present Cambrian-1 w/ @TongPetersb! Catch our oral presentation Friday @ 10am (Oral 5C) and our poster afterwards until 2pm (#3700 in East Hall A-C) 🪼🎉

2

12

57

7

11.0K

Xichen Pan Retweeted

J

Jiuhai Chen@JiuhaiC · Dec 6

🚨 New VLM Paper ! Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion 1️⃣ Are CLIP-style vision transformers the best vision encoder for VLMs? We explore new possibilities with Florence-2, a generative vision foundation model,…

1

5

30

8

7.0K

X

Xichen Pan@xichen_pan · Mar 18, 2024

SV3D takes an image as input and outputs camera-controlled novel views that are highly consistent across the views. We also propose techniques to convert these novel views into quality 3D meshes. View synthesis models are publicly released. Project page: sv3d.github.io

SStability AI@StabilityAI · Mar 18, 2024

Today, we are releasing Stable Video 3D, a generative model based on Stable Video Diffusion. This new model advances the field of 3D technology, delivering greatly improved quality and multi-view. The model is available now for commercial and non-commercial use with a Stability…

4

22

140

31

17.0K

Xichen Pan Retweeted

S

Saining Xie@sainingxie · Feb 6, 2024

🌎 𝕤𝕒𝕪 𝕙𝕖𝕝𝕝𝕠 𝕥𝕠 𝕧𝕚𝕣𝕝 🌏 virl-platform.github.io

10

64

326

102

95.0K

Xichen Pan Retweeted

H

Hila Chefer@hila_chefer · Jan 24, 2024

TLDR: Meet ✨Lumiere✨ our new text-to-video model from @GoogleAI! Lumiere is designed to create entire clips in just one go! Seamlessly opening up possibilities for many applications: Image-to-video 🖼️ Stylized generation 🖌️ Video editing 🪩 and beyond. See 🧵👇

72

205

920

352

252.0K

Xichen Pan Retweeted

W

Willis (Nanye) Ma@ma_nanye · Jan 19, 2024

Introducing Scalable Interpolant Transformer! SiT integrates a flexible interpolant framework into DiT, enabling a nuanced exploration of dynamical transport in image generation. With an FID of 2.06 on ImageNet 256, SiT pushes Interpolant-based models to new heights! (1/n)

2

14

100

64

26.0K

X

Xichen Pan@xichen_pan · Jan 17, 2024

Delighted to announce that our Kosmos-G has been accepted by ICLR 2024. Thanks my mentor @donglixp at @MSFTResearch. We are working on integrating Kosmos-G into diffusers. Looking forward to meeting you in Vienna!

XXichen Pan@xichen_pan · Oct 5, 2023

Check out our work on zero-shot subject driven generation. Now, prompt stable diffusion using not only text, but also images! Speed close to original SD Project Page: xichenpan.com/kosmosg/ Code: aka.ms/Kosmos-G

4

0

29

1

4.0K

Xichen Pan Retweeted

Y

Yoshitomo Matsubara@yoshitomo_cs · Nov 10, 2023

One thing I really like about #ICLR is that their review data are open to everyone! 🥰 Same as last year, I collected #ICLR2024 review data, modifying my previous script for OpenReview APIv2 Here are the histograms based on 7,331 submissions Hope this helps!🙋

12

24

207

43

115.0K