❄

❄️Andrew Zhao❄️@ICML25

@_AndrewZhao

PhD @Tsinghua_Uni. Absolute Zero,ExpeL,Diver-CT Research Intern @MSFTResearch, Ex. @ BIGAI. Interested in RL, Reasoning/Safety 4 LLMs, Agents. On job market 26'

Joined September 2020

3KFollowing

4KFollowers

Pinned

❄

❄️Andrew Zhao❄️@ICML25@_AndrewZhao · May 7

❄️Introducing Absolute Zero Reasoner: Our reasoner learns to both propose tasks that maximize learnability and improve reasoning by solving them, entirely through self-play—with no external data! It overall outperforms other "zero" models in math & coding domains. 🧵 1/

_AndrewZhao's tweet image. ❄️Introducing Absolute Zero Reasoner: Our reasoner learns to both propose tasks that maximize learnability and improve reasoning by solving them, entirely through self-play—with no external data! It overall outperforms other "zero" models in math &amp; coding domains.
🧵 1/

342

2.0K

1.0K

497.0K

❄️Andrew Zhao❄️@ICML25 Retweeted

Fang Wu@WUFang40615703 · Jul 22

🚨 New Paper Alert The Invisible Leash: Why RLVR May Not Escape Its Origin 📄 arxiv.org/abs/2507.14843 By Fang Wu*, Weihao Xuan*, Ximing Lu, Zaid Harchaoui, Yejin Choi Does RLVR expand reasoning capabilities—or just amplify what models already know? 🧵 Thread with key in sights

2.0K

❄

❄️Andrew Zhao❄️@ICML25@_AndrewZhao · Jul 18

Unlock the Hidden Diversity in Your Language Model. In our new paper, Intent Factored Generation (IFG), we propose an inference time method to increase the diversity of generations from LLMs. IFG leads to improvements in searching for solutions to maths and code problems. (1/6)

UUljad Berdica@uljadb99 · Jul 18

Unlock real diversity in your LLM! 🚀 LLM outputs can be boring and repetitive. Today, we release Intent Factored Generation (IFG) to: - Sample conceptually diverse outputs💡 - Improve performance on math and code reasoning tasks🤔 - Get more engaging conversational agents 🤖

6.0K

❄️Andrew Zhao❄️@ICML25 Retweeted

Tanishq Abraham is at ICML@iScienceLuvr · Jul 22

The Invisible Leash: Why RLVR May Not Escape Its Origin "RLVR is constrained by the base model's support-unable to sample solutions with zero initial probability-and operates as a conservative reweighting mechanism that may restrict the discovery of entirely original solutions"…

170

129

14.0K

❄️Andrew Zhao❄️@ICML25 Retweeted

机

机器之心 JIQIZHIXIN@jiqizhixin · Jul 22

Anthropic just released a research paper. Inverse Scaling in Test-Time Compute This study shows that longer reasoning in Large Reasoning Models (LRMs) can hurt performance—revealing a surprising inverse scaling between reasoning length and accuracy. According to this paper,…

555

489

53.0K

❄️Andrew Zhao❄️@ICML25 Retweeted

François Chollet@fchollet · Jul 21

Officially validated IMO gold medal, purely via search in token space, achieved in 4.5 hrs (unclear at what compute cost). The solutions read nicely as well deepmind.google/discover/blog/…

176

2.0K

499

153.0K

❄️Andrew Zhao❄️@ICML25 Retweeted

Quoc Le@quocleix · Jul 21

Excited to share that a scaled up version of Gemini DeepThink achieves gold-medal standard at the International Mathematical Olympiad. This result is official, and certified by the IMO organizers. Watch out this space, more to come soon! deepmind.google/discover/blog/…

707

53.0K

❄

❄️Andrew Zhao❄️@ICML25@_AndrewZhao · Jul 18

Dale talks about Large Language Models and Computation at the Programmatic Representations for Agent Learning workshop at #ICML2025!

SShao-Hua Sun@shaohua0116 · Jul 17

Our #ICML2025 Programmatic Representations for Agent Learning workshop will take place tomorrow, July 18th, at the West Meeting Room 301-305, exploring how programmatic representations can make agent learning more interpretable, generalizable, efficient, and safe! Come join us!

4.0K

❄️Andrew Zhao❄️@ICML25 Retweeted

roon@tszzl · Jul 18

adding post training datasets will not lead to techno-economic machine runaway

432

38.0K

❄️Andrew Zhao❄️@ICML25 Retweeted

Google Research@GoogleResearch · Jul 17

Congratulations to Sergey Ioffe & Christian Szegedy, authors of "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift", recipients of the #ICML2025 Test-of-Time Award! arxiv.org/abs/1502.03167

130

5.0K

❄

❄️Andrew Zhao❄️@ICML25@_AndrewZhao · Jul 18

I will present this work at the ICML Multi-Agent System (MAS) workshop during the poster sessions. If you are interested in this work or self-play LLMs in general, please feel free to come chat with me!

MMickel Liu@mickel_liu · Jun 11

🤔Conventional LM safety alignment is reactive: find vulnerabilities→patch→repeat 🌟We propose 𝗼𝗻𝗹𝗶𝗻𝗲 𝐦𝐮𝐥𝐭𝐢-𝐚𝐠𝐞𝐧𝐭 𝗥𝗟 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 where Attacker & Defender self-play to co-evolve, finding diverse attacks and improving safety by up to 72% vs. RLHF 🧵

4.0K

❄

❄️Andrew Zhao❄️@ICML25@_AndrewZhao · Jul 18

If you want one take away from ICML: there’s gonna be 100 papers on exploration for reasoning in the coming month(s)

186

18.0K

❄️Andrew Zhao❄️@ICML25 Retweeted

Together AI@togethercompute · Jul 17

Most AI benchmarks test the past. But real intelligence is about predicting the future. Introducing FutureBench — a new benchmark for evaluating agents on real forecasting tasks that we developed with @huggingface 🔍 Reasoning > memorization 📊 Real-world events 🧠 Dynamic,…

25.0K

❄

❄️Andrew Zhao❄️@ICML25@_AndrewZhao · Jul 17

Something to watch out for when evaluating tool-using agents: they can "cheat" by browsing the web and simply looking up the answer key. The @OpenAI ChatGPT Agent team had to take special care to mitigate this risk.

OOpenAI@OpenAI · Jul 17

ChatGPT agent’s capabilities are reflected in its state-of-the-art performance on academic and real-world task evaluations, like data modeling, spreadsheet editing, and investment banking.

827

119

73.0K

❄️Andrew Zhao❄️@ICML25 Retweeted

Gokul Swamy@g_k_swamy · Jul 15

Recent work has seemed somewhat magical: how can RL with *random* rewards make LLMs reason? We pull back the curtain on these claims and find out this unexpected behavior hinges on the inclusion of certain *heuristics* in the RL algorithm. Our blog post: tinyurl.com/heuristics-con…

476

426

81.0K

❄️Andrew Zhao❄️@ICML25 Retweeted

Yang Yue@YangYue_THU · Jul 13

Thrilled that our paper was selected for the #ICML2025 AI4Math Best Paper Award! 🎉 Sadly, I can’t attend in person due to visa issues, but Andrew will present on behalf of our team. 🎤 Don’t miss his talk—check it out! 13:45-14:00 pm, July 18. Ballroom C, West Building.

1.0K

❄️Andrew Zhao❄️@ICML25 Retweeted

Kimi.ai@Kimi_Moonshot · Jul 11

🚀 Hello, Kimi K2! Open-Source Agentic Model! 🔹 1T total / 32B active MoE model 🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models 🔹Strong in coding and agentic tasks 🐤 Multimodal & thought-mode not supported for now With Kimi K2, advanced agentic intelligence…

282

1.0K

7.0K

3.0K

2.5M

❄️Andrew Zhao❄️@ICML25 Retweeted

Prime Intellect@PrimeIntellect · Jul 10

Releasing SYNTHETIC-2: our open dataset of 4m verified reasoning traces spanning a comprehensive set of complex RL tasks and verifiers. Created by hundreds of compute contributors across the globe via our pipeline parallel decentralized inference stack. primeintellect.ai/blog/synthetic…

454

160

105.0K

❄️Andrew Zhao❄️@ICML25 Retweeted

jxmo@jxmnop · Jul 10

new blog: How to scale RL to 10^26 FLOPs everyone is trying to figure out the right way to scale reasoning with RL ilya compared the Internet to fossil fuel: it may be the only useful data we have. and it's expendable perhaps we should learn to reason from The Internet (not…

553

578

83.0K