Yue Fan
@YFan_UCSC
PhD student at University of California, Santa Cruz (UCSC)
Before o3 impressed everyone with ๐ฅvisual reasoning๐ฅ, we already had faith in and were exploring models that can think with images. ๐ Hereโs our shot, GRIT: Grounded Reasoning with Images & Texts that trains MLLMs to think while performing visual grounding. It is done via RLโฆ

๐๐๐๐ญ๐ข๐ง๐ ๐๐ฉ๐๐ง๐๐ ๐ข๐ฌ ๐ง๐จ๐ญ ๐๐ฌ ๐ก๐๐ซ๐ ๐๐ฌ ๐ฒ๐จ๐ฎ ๐ญ๐ก๐ข๐ง๐ค. If you don't believe you can compete, you've already lost. Winning starts with mindset. ๐Introducing ๐จ๐๐๐๐ ๐บ2, ๐ญ๐ก๐ ๐ฐ๐จ๐ซ๐ฅ๐'๐ฌ ๐๐๐ฌ๐ญ ๐๐จ๐ฆ๐ฉ๐ฎ๐ญ๐๐ซ-๐ฎ๐ฌ๐ ๐๐ ๐๐ง๐ญ, and the secondโฆ
Introducing Agent S2 โ Our newest open-source AI agent setting new records in computer & smartphone use! We are seeing Agent S2 solve a whole new range of tasks, pushing the boundaries of AI-driven autonomy. ๐ฅ Why itโs special: โ #1 in OSWorld (34.5% accuracy at 50 steps,โฆ
๐จ ๐๐ฎ๐ข๐ฅ๐๐ข๐ง๐ ๐๐ฎ๐ฅ๐ญ๐ข๐ฆ๐จ๐๐๐ฅ ๐1? ๐๐ง๐ญ๐ซ๐จ๐๐ฎ๐๐ข๐ง๐ ๐๐ฎ๐ฅ๐ญ๐ข๐ฆ๐จ๐๐๐ฅ ๐๐ง๐๐จ๐ง๐ฌ๐ข๐ฌ๐ญ๐๐ง๐๐ฒ ๐๐๐๐ฌ๐จ๐ง๐ข๐ง๐ (๐๐๐๐), a new testbed to reason about multimodal inconsistencies, which requires ๐ ๐๐๐ ๐๐๐๐๐๐๐๐๐๐ & ๐๐๐๐๐๐๐๐๐. Whyโฆ
New Paper Alert: Multimodal Inconsistency Reasoning (MMIR)! โจ Ever visited a webpage where the text says โIKEA deskโ yet images and descriptions elsewhere show a totally different brand? Or read a slide that shows โ50% growthโ in the text but the accompanying chart looks flat?โฆ
๐ก๏ธ R1 Safety Paper Alert! ๐ฐ How safe are large reasoning models like R1? What is their safety behavior? Does their enhanced capability introduce greater risks? โ We present a comprehensive safety analysis on large reasoning models: ๐ฅ Key Findings: 1๏ธโฃOpen-source R1 models lagโฆ
Check out the โMojitoโ๐น! A new video generation work lead by my lab mate Xuehai
๐ New Video Generation Paper Alert! ๐ท ๐คHow can we make video diffusion models capable of integrating directional guidance and controllable motion intensity guidance? We propose Mojito: Motion Trajectory and Intensity Control for Video Generation. ๐๐๐ ๐Project page:โฆ