Jiarui Yao

@ExplainMiracles

UIUC CS PhD, 24

Joined May 2023

444Following

82Followers

Pinned

Jiarui Yao Retweeted

Shizhe Diao@shizhediao · Apr 18

Thrilled to share my first project at NVIDIA! ✨ Today’s language models are pre-trained on vast and chaotic Internet texts, but these texts are unstructured and poorly understood. We propose CLIMB — Clustering-based Iterative Data Mixture Bootstrapping — a fully automated…

311

207

32.0K

Jiarui Yao Retweeted

Yong Lin@Yong18850571 · Jul 15

(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B…

249

119

57.0K

Jiarui Yao Retweeted

Noam Razin@noamrazin · Jul 11

Reward models (RMs) are key to language model post-training and inference pipelines. But, little is known about the relative pros and cons of different RM types. 📰 We investigate why RMs implicitly defined by language models (LMs) often generalize worse than explicit RMs 🧵 1/6

155

131

10.0K

Jiarui Yao Retweeted

Shulin Tian@shulin_tian · Jun 17

🎥 Video is already a tough modality for reasoning. Egocentric video? Even tougher! It is longer, messier, and harder. 💡 How do we tackle these extremely long, information-dense sequences without exhausting GPU memory or hitting API limits? We introduce 👓Ego-R1: A framework…

5.0K

Jiarui Yao Retweeted

Xiusi Chen@xiusi_chen · Jun 4

Can LLMs make rational decisions like human experts? 📖Introducing DecisionFlow: Advancing Large Language Model as Principled Decision Maker We introduce a novel framework that constructs a semantically grounded decision space to evaluate trade-offs in hard decision-making…

5.0K

Jiarui Yao Retweeted

Peixuan Han (韩沛煊)@peixuanhakhan · May 30

(1/5) Want to make your LLM a skilled persuader? Check out our latest paper: "ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind"! For details: 📄Arxiv: arxiv.org/pdf/2505.22961 🛠️GitHub: github.com/ulab-uiuc/ToMAP

2.0K

Jiarui Yao Retweeted

Cheng Qian@qiancheng1231 · May 27

📢 New Paper Drop: From Solving to Modeling! LLMs can solve math problems — but can they model the real world? 🌍 📄 arXiv: arxiv.org/pdf/2505.15068 💻 Code: github.com/qiancheng0/Mod… Introducing ModelingAgent, a breakthrough system for real-world mathematical modeling with LLMs.

101

12.0K

Jiarui Yao Retweeted

Hanze Dong@hendrydong · May 9

How to improve the test-time scalability? - Separate thinking & solution phases to control performance under budget constraint - Budget-Constrained Rollout + GRPO - Outperforms baselines on math/code. - Cuts token 30% usage without hurting performance huggingface.co/papers/2505.05…

7.0K

Jiarui Yao@ExplainMiracles · May 15

🎉 Evaluation Agent is accepted to ACL 2025 Main! Big congrats to the co-authors! We've open-sourced the code & prompt database: 📷 github.com/Vchitect/Evalu…… Let’s push LLM evaluation for GenAI to the next level! 📷#ACL2025 #LLM #GenAI

SShulin Tian@shulin_tian · Dec 17

Tired of waiting forever to evaluate your Gen models? 🚀 Meet 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻 𝗔𝗴𝗲𝗻𝘁 🤖📊 – a fast, efficient, and promptable framework that evaluates your Gen models in just one line of input! ⚡ Inspired by a human-like evaluation process, it recursively samples a few…

5.0K

Jiarui Yao Retweeted

Xiusi Chen@xiusi_chen · May 6

🚀 Can we cast reward modeling as a reasoning task? 📖 Introducing our new paper: RM-R1: Reward Modeling as Reasoning 📑 Paper: arxiv.org/pdf/2505.02387 💻 Code: github.com/RM-R1-UIUC/RM-… Inspired by recent advances of long chain-of-thought (CoT) on reasoning-intensive tasks, we…

202

113

36.0K

Jiarui Yao@ExplainMiracles · May 6

We introduce Gradient Variance Minimization (GVM)-RAFT, a principled dynamic sampling strategy that minimizes gradient variance to improve the efficiency of chain-of-thought (CoT) training in LLMs. – Achieves 2–4× faster convergence than RAFT – Improves accuracy on math…

ExplainMiracles's tweet image. We introduce Gradient Variance Minimization (GVM)-RAFT, a principled dynamic sampling strategy that minimizes gradient variance to improve the efficiency of chain-of-thought (CoT) training in LLMs.

– Achieves 2–4× faster convergence than RAFT
– Improves accuracy on math…

6.0K

Jiarui Yao@ExplainMiracles · May 5

Thrilled to announce that our paper Sparse VideoGen got into #ICML2025! 🎉 Our new approach to speedup Video Generation by 2×. Details in the thread/paper. Huge thanks to my collaborators! Blog: svg-project.github.io Paper: arxiv.org/abs/2502.01776 Code:…

HHaocheng Xi@HaochengXiUCB · Mar 12

🚀 Introducing #SparseVideoGen: 2x speedup in video generation with HunyuanVideo with high pixel-level fidelity (PSNR = 29)! No training is required, no perceptible difference to the human eye! Blog: svg-project.github.io Paper: arxiv.org/abs/2502.01776 Code:…

6.0K

Jiarui Yao Retweeted

Manling Li@ManlingLi_ · May 3

Welcome to join our Tutorial on Foundation Models Meet Embodied Agents, with @YunzhuLiYZ @maojiayuan @wenlong_huang ! Website: …models-meet-embodied-agents.github.io

233

106

16.0K

Jiarui Yao@ExplainMiracles · Apr 17

Negative samples are "not that important", while removing samples with all negative outputs is "important". 🤣

HHanze Dong@hendrydong · Apr 16

🤖What makes GRPO work? Rejection Sampling→Reinforce→GRPO - RS is underrated - Key of GRPO: implicitly remove prompts without correct answer - Reinforce+Filtering > GRPO (better KL) 💻github.com/RLHFlow/Minima… 📄arxiv.org/abs/2504.11343 👀RAFT was invited to ICLR25! Come & Chat☕️

111

Jiarui Yao Retweeted

Cheng Qian@qiancheng1231 · Feb 18

🚀Can your language model think strategically? 🧠 SMART: Boosting LM self-awareness to reduce Tool Overuse & optimize reasoning! 🌐 arxiv.org/pdf/2502.11435 📊 github.com/qiancheng0/Ope… Smaller models, bigger brains. Smarter tool use, better results! 🔥 #AI #LLM

110

14.0K

Jiarui Yao Retweeted

Hanning Zhang@HanningZhangHK · Feb 17

🚀 Excited to share our latest work on Iterative-DPO for math reasoning! Inspired by DeepSeek-R1 & rule-based PPO, we trained Qwen2.5-MATH-7B on Numina-Math prompts. Our model achieves 47.0% pass@1 on AIME24, MATH500, AMC, Minerva-Math, OlympiadBench—outperforming…

123

10.0K