Rui Yang

@RuiYang70669025

PhD student @ UIUC

Illinois, USA

Joined November 2022

434Following

350Followers

Pinned

Rui Yang@RuiYang70669025 · Feb 14

🤖Can MLLM agents reason about spatial relationships and plan atomic actions for navigation & manipulation? 🔥 Meet EmbodiedBench 🏆—the first fine-grained benchmark for MLLM-based embodied agents! 📄 Paper: arxiv.org/abs/2502.09560 🌐 Website & code: embodiedbench.github.io

RuiYang70669025's tweet image. 🤖Can MLLM agents reason about spatial relationships and plan atomic actions for navigation &amp; manipulation?

🔥 Meet EmbodiedBench 🏆—the first fine-grained benchmark for MLLM-based embodied agents!

📄 Paper: arxiv.org/abs/2502.09560
🌐 Website &amp; code: embodiedbench.github.io

169

42.0K

Rui Yang Retweeted

Yong Lin@Yong18850571 · Jul 22

🔥Our Goedel-Prover-V2-32B topped the PutnamBench Leaderboard by solving 86 problems —nearly 2× more than the previous SOTA DeepSeek-Prover-V2-671B (solved 47), while using: * 1/20 the model size (32B vs. 671B) * 1/5 the passes (184 vs. 1024) Meanwhile, we also release *…

17.0K

Rui Yang Retweeted

Hao Sun - RL @ACL 🇦🇹@HolarisSun · Jul 21

🚀 RL is powering breakthroughs in LLM alignment, reasoning, and agentic apps. Are you ready to dive into the RL x LLM frontier? Join us at @aclmeeting ACL’25 tutorial: Inverse RL Meets LLM Alignment this Sunday at Vienna🇦🇹(Jul 27th, 9am) 📄 Preprint at huggingface.co/papers/2507.13…

4.0K

Rui Yang Retweeted

Hanyang Chen@hc81Jeremy · Jul 18

Grateful for the chance to present EmbodiedBench at ICML as an Oral. A rewarding experience full of learning. Thanks for @RuiYang70669025 @hengjinlp @jyzhang1208 @huan_zhang12 Mark_Zhao @ManlingLi_ Tong_Zhang and many others who make it possible. See you next time.

5.0K

Rui Yang Retweeted

WANG Ruida@RickyRDWang · Mar 18

🚀 Introducing MA-LoT Theorem Framework: An open-source multi-agent framework utilizing the Long Chain-of-Thought to boost automated theorem-proving🎉 ✅ Achieving 61.07% accuracy rate under pass@32 on MiniF2F-Test outperforming Goedel-Prover, Lean_STP and DeepSeek-Prover-V1.5…

20.0K

Rui Yang Retweeted

Yong Lin@Yong18850571 · Jul 15

(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B…

247

117

56.0K

Rui Yang Retweeted

Zhenhailong Wang@zhenhailongW · Jul 10

Learning to perceive while learning to reason! We introduce PAPO: Perception-Aware Policy Optimization, a direct upgrade to GRPO for multimodal reasoning. PAPO relies on internal supervision signals. No extra annotations, reward models, or teacher models needed. 🧵1/3

2.0K

Rui Yang Retweeted

Haoran He@tinner_he · Jul 11

🤩Mind-blowing discovery: Random policies can be surprisingly powerful for decision-making! Our ICML 2025 paper reveals how simple randomness leads to sophisticated reward-matching policies. Let me break this down...

556

Rui Yang Retweeted

Qianhui Wu@5000hui · Jul 11

Awesome work! 🥂 I feel like the design of our GUI-Actor — which can propose multiple candidate regions in one forward pass— combined with a Grounding Verifier could work really well within the 'test-time scaling' framework of GTA1! 😀

191

Rui Yang@RuiYang70669025 · Jun 14

Insightful post on the scalability of off-policy RL.

SSeohong Park@seohong_park · Jun 13

Q-learning is not yet scalable seohong.me/blog/q-learnin… I wrote a blog post about my thoughts on scalable RL algorithms. To be clear, I'm still highly optimistic about off-policy RL and Q-learning! I just think we haven't found the right solution yet (the post discusses why).

962

Rui Yang Retweeted

Jack Bai@jackbai_jkb · Jun 10

🧵 1/7 Should AI agents "think more" or "do more"? 🤔 The current trend is to scale test-time compute, making agents generate longer reasoning traces. But what if that’s the wrong approach for interactive tasks? In our new work, we argue for a new scaling dimension: Test-Time…

13.0K

Rui Yang Retweeted

Feng Luo@FengLuo895614 · Jun 10

🚀 Can LLMs stop overthinking when detailed reasoning isn't needed? Excited to share our latest work on LLM reasoning: AutoL2S 🧠⚡ 📄 Paper: arxiv.org/abs/2505.22662 🤖 Model: huggingface.co/amandaa/AutoL2… LLMs often overthink—generating unnecessarily long CoTs even for easy…

524

Rui Yang@RuiYang70669025 · Jun 8

Excited to share that EmbodiedBench was selected for an Oral at ICML 2025! We recently added results for new models (InternVL3, Gemma3, Ovis2) and released a large agent trajectory dataset on 🤗: embodiedbench.github.io Try training and evaluating your MLLM for embodied agents!

RRui Yang@RuiYang70669025 · Feb 14

12.0K

Rui Yang@RuiYang70669025 · Jun 4

🚀 Excited to share GUI-Actor—a new approach for GUI grounding! Big thanks to @_akhaliq for featuring our work! 🌐 Project page: microsoft.github.io/GUI-Actor/ 📜 Paper: arxiv.org/pdf/2506.03143 🤔 What's limiting coordinate generation-based GUI grounding? 1️⃣ Weak spatial-semantic…

AAK@_akhaliq · Jun 4

Microsoft just dropped GUI-Actor on Hugging Face Coordinate-Free Visual Grounding for GUI Agents

102

25.0K

Rui Yang@RuiYang70669025 · Jun 4

Thanks for sharing our work！GUI-Actor is a new GUI grounding method that combines an attention-based action head with a grounding verifier, different from previous text-based coordinate prediction methods.

AAK@_akhaliq · Jun 4

Microsoft just dropped GUI-Actor on Hugging Face Coordinate-Free Visual Grounding for GUI Agents

1.0K

Rui Yang Retweeted

Shizhe Diao@shizhediao · Jun 2

Does RL truly expand a model’s reasoning🧠capabilities? Contrary to recent claims, the answer is yes—if you push RL training long enough! Introducing ProRL 😎, a novel training recipe that scales RL to >2k steps, empowering the world’s leading 1.5B reasoning model💥and offering…

407

364

52.0K

Rui Yang Retweeted

Cheng Qian@qiancheng1231 · May 27

📢 New Paper Drop: From Solving to Modeling! LLMs can solve math problems — but can they model the real world? 🌍 📄 arXiv: arxiv.org/pdf/2505.15068 💻 Code: github.com/qiancheng0/Mod… Introducing ModelingAgent, a breakthrough system for real-world mathematical modeling with LLMs.

101

12.0K

Rui Yang Retweeted

Taiwei Shi@taiwei_shi · May 27

Is there anything that Qwen cannot do at this point? 😂

1.0K

205

86.0K

Rui Yang Retweeted

Hanze Dong@hendrydong · May 27

🚀 A unified strategy for parallel decoding: Fractured CoT Reasoning We explore three dims of sampling: - Reasoning trajectories - Final solutions per traj - Depth of reasoning Maximize accuracy-cost trade-off! Allocate computation for huge gains. Paper: arxiv.org/pdf/2505.12992

116

19.0K