Yue Fan

@YFan_UCSC

PhD student at University of California, Santa Cruz (UCSC)

Joined May 2023

62Following

153Followers

Pinned

Yue Fan@YFan_UCSC · May 23

Before o3 impressed everyone with 🔥visual reasoning🔥, we already had faith in and were exploring models that can think with images. 🚀 Here’s our shot, GRIT: Grounded Reasoning with Images & Texts that trains MLLMs to think while performing visual grounding. It is done via RL…

YFan_UCSC's tweet image. Before o3 impressed everyone with 🔥visual reasoning🔥, we already had faith in and were exploring models that can think with images. 🚀

Here’s our shot, GRIT: Grounded Reasoning with Images &amp; Texts that trains MLLMs to think while performing visual grounding. It is done via RL…

165

152

70.0K

Yue Fan Retweeted

AK@_akhaliq · May 23

GRIT Teaching MLLMs to Think with Images

302

158

27.0K

Yue Fan@YFan_UCSC · Mar 12

𝐁𝐞𝐚𝐭𝐢𝐧𝐠 𝐎𝐩𝐞𝐧𝐀𝐈 𝐢𝐬 𝐧𝐨𝐭 𝐚𝐬 𝐡𝐚𝐫𝐝 𝐚𝐬 𝐲𝐨𝐮 𝐭𝐡𝐢𝐧𝐤. If you don't believe you can compete, you've already lost. Winning starts with mindset. 🚀Introducing 𝑨𝒈𝒆𝒏𝒕 𝑺2, 𝐭𝐡𝐞 𝐰𝐨𝐫𝐥𝐝'𝐬 𝐛𝐞𝐬𝐭 𝐜𝐨𝐦𝐩𝐮𝐭𝐞𝐫-𝐮𝐬𝐞 𝐚𝐠𝐞𝐧𝐭, and the second…

SSimular@SimularAI · Mar 12

Introducing Agent S2 – Our newest open-source AI agent setting new records in computer & smartphone use! We are seeing Agent S2 solve a whole new range of tasks, pushing the boundaries of AI-driven autonomy. 🔥 Why it’s special: ✅ #1 in OSWorld (34.5% accuracy at 50 steps,…

547

447

182.0K

Yue Fan@YFan_UCSC · Feb 25

🚨 𝐁𝐮𝐢𝐥𝐝𝐢𝐧𝐠 𝐌𝐮𝐥𝐭𝐢𝐦𝐨𝐝𝐚𝐥 𝐑1? 𝐈𝐧𝐭𝐫𝐨𝐝𝐮𝐜𝐢𝐧𝐠 𝐌𝐮𝐥𝐭𝐢𝐦𝐨𝐝𝐚𝐥 𝐈𝐧𝐜𝐨𝐧𝐬𝐢𝐬𝐭𝐞𝐧𝐜𝐲 𝐑𝐞𝐚𝐬𝐨𝐧𝐢𝐧𝐠 (𝐌𝐌𝐈𝐑), a new testbed to reason about multimodal inconsistencies, which requires 𝒅𝒆𝒆𝒑 𝒑𝒆𝒓𝒄𝒆𝒑𝒕𝒊𝒐𝒏 & 𝒓𝒆𝒂𝒔𝒐𝒏𝒊𝒏𝒈. Why…

QQianqi "Jackie" Yan@qianqi_yan · Feb 25

New Paper Alert: Multimodal Inconsistency Reasoning (MMIR)! ✨ Ever visited a webpage where the text says “IKEA desk” yet images and descriptions elsewhere show a totally different brand? Or read a slide that shows “50% growth” in the text but the accompanying chart looks flat?…

130

26.0K

Yue Fan Retweeted

Kaiwen Zhou@KaiwenZhou9 · Feb 18

🛡️ R1 Safety Paper Alert! 📰 How safe are large reasoning models like R1? What is their safety behavior? Does their enhanced capability introduce greater risks? — We present a comprehensive safety analysis on large reasoning models: 🔥 Key Findings: 1️⃣Open-source R1 models lag…

102

23.0K

Yue Fan@YFan_UCSC · Feb 11

Check out the “Mojito”🍹! A new video generation work lead by my lab mate Xuehai

XXuehai He@XuehaiH · Feb 10

🚀 New Video Generation Paper Alert! 📷 🤔How can we make video diffusion models capable of integrating directional guidance and controllable motion intensity guidance? We propose Mojito: Motion Trajectory and Intensity Control for Video Generation. 👇👇👇 📄Project page:…

323