Jiuhai Chen

@JiuhaiC

CS Phd student @ UMD Ex-intern @Meta @Microsoft @Amazon On the industry job market

Maryland, USA

Joined August 2019

2KFollowing

670Followers

Pinned

Jiuhai Chen@JiuhaiC · May 19

🚀 Introducing BLIP3-o: A Family of Fully Open Unified Multimodal Models arxiv.org/pdf/2505.09568 🔓 Attempting to unlock GPT-4o’s image generation. Open source everything! Including 25 million pre-training data!

JiuhaiC's tweet image. 🚀 Introducing BLIP3-o: A Family of Fully Open Unified Multimodal Models arxiv.org/pdf/2505.09568
🔓 Attempting to unlock GPT-4o’s image generation.
Open source everything!
Including 25 million pre-training data!

601

374

1.2M

Jiuhai Chen@JiuhaiC · Jun 10

Super excited to attend #CVPR2025 in person! Catch our spotlight talk on BLIP3-o at the Computer Vision in the Wild workshop 👉 computer-vision-in-the-wild.github.io/cvpr-2025/ Also check out Florence-VL at poster #372, Sunday 10:30–12:30

282

Jiuhai Chen Retweeted

Salesforce AI Research@SFResearch · May 28

🌊Tried BLIP3-o? Our family of unified multimodal models is making waves, now open-sourced for the AI Research community. 🔓 Github Repo: bit.ly/4muUBzm 🤗 Models: bit.ly/4kB9oXK 🪧 Demo: bit.ly/4jb0YVD 📰 News: bit.ly/3Z1tuC8 ✍️ Blog:…

1.0K

Jiuhai Chen@JiuhaiC · May 23

Check out our BLIP3-o as the notable AI models of the week!

TTuringPost@TheTuringPost · May 21

10 notable AI models of the week: ▪️ Aya Vision ▪️ INTELLECT-2 ▪️ MiniMax-Speech ▪️ SWE-1 ▪️ Seed1.5-VL ▪️ BLIP3-o ▪️ Skywork-VL ▪️ Behind Maya ▪️ MiMo ▪️ AM-Thinking-v1 🧵

489

Jiuhai Chen Retweeted

Caiming Xiong@CaimingXiong · May 19

Introducing 🔥BLIP3-o🔥 -- A Family of Fully Open Unified Multimodal Models for Both Image Understanding and Image Understanding 📊Paper: arxiv.org/pdf/2505.09568 🤗Models and Datasets: huggingface.co/BLIP3o 🧠Code: github.com/JiuhaiChen/BLI… 💻Demo: blip3o.salesforceresearch.ai We…

232

122

18.0K

Jiuhai Chen@JiuhaiC · May 19

Our gradio demo for BLIP3-o: huggingface.co/spaces/BLIP3o/… using the open-source checkpoint: huggingface.co/BLIP3o/BLIP3o-…

JiuhaiC's tweet card. BLIP3o/BLIP3o-Model-8B · Hugging Face

385

Jiuhai Chen@JiuhaiC · Apr 11

Our first attempt at unlocking GPT-4o’s image generation — more to come in the next few weeks!

XXichen Pan@xichen_pan · Apr 11

We find training unified multimodal understanding and generation models is so easy, you do not need to tune MLLMs at all. MLLM's knowledge/reasoning/in-context learning can be transferred from multimodal understanding (text output) to generation (pixel output) even it is FROZEN!

694

Jiuhai Chen@JiuhaiC · Feb 26

Florence-VL is accepted by #CVPR2025 . Thanks for all coauthors! BTW, a very powerful multimodal for image understanding & generation will come soon, stay tuned ! 🚀🔥

JJiuhai Chen@JiuhaiC · Dec 6

🚨 New VLM Paper ! Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion 1️⃣ Are CLIP-style vision transformers the best vision encoder for VLMs? We explore new possibilities with Florence-2, a generative vision foundation model,…

2.0K

Jiuhai Chen Retweeted

Jiuhai Chen@JiuhaiC · Dec 6

7.0K

Jiuhai Chen@JiuhaiC · Dec 19

This project really changed how I think about multimodal models and LLMs. I used to believe that multimodal (visual) prediction required significant changes to the model and heavy pretraining, like Chameleon. But surprisingly, the opposite is true! In large autoregressive models,…

ZZhuang Liu@liuzhuang1234 · Dec 19

How far is an LLM from not only understanding but also generating visually? Not very far! Introducing MetaMorph---a multimodal understanding and generation model. In MetaMorph, understanding and generation benefit each other. Very moderate generation data is needed to elicit…

474

449

126.0K

Jiuhai Chen@JiuhaiC · Dec 6

Try our Florence-VL demo !

GGradio@Gradio · Dec 6

📣 Microsoft Research releases Florence-VL, a new family of MLLMs powered by the generative vision foundation model Florence-2. Achieves significant improvements in general VQA, perception, hallucination, OCR, Chart, knowledge-intensive understanding, and more🔥Learn more👇

662