Bo Liu (Benjamin Liu) (@Benjamin_eecs)

Pinned

B

Bo Liu (Benjamin Liu)@Benjamin_eecs · Jul 1

We've always been excited about self-play unlocking continuously improving agents. Our insight: RL selects generalizable CoT patterns from pretrained LLMs. Games provide perfect testing grounds with cheap, verifiable rewards. Self-play automatically discovers and reinforces…

Benjamin_eecs's tweet image. We've always been excited about self-play unlocking continuously improving agents. Our insight: RL selects generalizable CoT patterns from pretrained LLMs. Games provide perfect testing grounds with cheap, verifiable rewards. Self-play automatically discovers and reinforces…

4

52

268

179

63.0K

Pinned

B

Bo Liu (Benjamin Liu)@Benjamin_eecs · Jul 7

We are launching a new benchmark for human-AI coordination 🙎🤖 in the Hanabi card game. 🎇🎆 If you are interested in developing methods that allow agents to collaborate w/ and support humans in complex partially observable tasks this is just for you. Crucially, we designed…

TTin Dizdarevic @ ICML 🇨🇦@tindiz_cs · Jul 7

How can we build AI that can actually cooperate with humans? We are announcing the Ad-Hoc Human-AI Coordination Challenge (AH2AC2) – a Hanabi benchmark designed to push the frontier of human-AI cooperation, accepted as a spotlight poster at @icmlconf 2025! 🧵👇

2

5

49

15

6.0K

Bo Liu (Benjamin Liu) Retweeted

Z

Zichen Liu@zzlccc · 4 h

Learning GSPO proposed by Qwen team: fig 1. they propose to use sequence likelihood for importance sampling fig 2. but from the RL course by @svlevine, this is the original form of off-policy PG fig 3. per-token IS in (Dr) GRPO is an approximation of it Am I missing anything?

0

6

60

43

3.0K

B

Bo Liu (Benjamin Liu)@Benjamin_eecs · Jul 21

Pleased to share that our Multi-Turn Interactions in LLMs workshop at NeurIPS 2025! …shop-multi-turn-interaction.github.io Welcome work on Multi-Turn RL, multi-turn human<->agent/agent<->agent/agent<->environment interactions, multi-turn tool use, multi-turn alignment, multi-turn evaluation,…

MMulti-Turn Interaction LLM Workshop @ NeurIPS 2025@mti_neurips · Jul 21

🚀 Call for Papers — @NeurIPSConf 2025 Workshop Multi-Turn Interactions in LLMs 📅 December 6/7 · 📍 San Diego Convention Center Join us to shape the future of interactive AI. Topics include but are not limited to: 🧠 Multi-Turn RL for Agentic Tasks (e.g., web & GUI agents,…

0

3

15

5

1.0K

B

Bo Liu (Benjamin Liu)@Benjamin_eecs · Jul 21

🤔Long-horizon tasks: How to train LLMs for the marathon?🌀 Submit anything on 🔁"Multi-turn Interactions in LLMs"🔁 to our @NeurIPSConf workshop by 08/22: 📕 Multi-Turn RL ⚖️ Multi-Turn Alignment 💬 Multi-Turn Human-AI Teaming 📊 Multi-Turn Eval ♾️You name it! #neurips #LLM

MMulti-Turn Interaction LLM Workshop @ NeurIPS 2025@mti_neurips · Jul 21

🚀 Call for Papers — @NeurIPSConf 2025 Workshop Multi-Turn Interactions in LLMs 📅 December 6/7 · 📍 San Diego Convention Center Join us to shape the future of interactive AI. Topics include but are not limited to: 🧠 Multi-Turn RL for Agentic Tasks (e.g., web & GUI agents,…

1

14

78

28

7.0K

B

Bo Liu (Benjamin Liu)@Benjamin_eecs · Jul 21

Very excited to share that an advanced version of Gemini Deep Think is the first to have achieved gold-medal level in the International Mathematical Olympiad! 🏆, solving five out of six problems perfectly, as verified by the IMO organizers! It’s been a wild run to lead this…

TThang Luong@lmthang · Jul 25, 2024

Super thrilled to share that our AI has has now reached silver medalist level in Math at #imo2024 (1 point away from 🥇)! Since Jan, we now not only have a much stronger version of #AlphaGeometry, but also an entirely new system called #AlphaProof, capable of solving many more…

76

226

2.0K

227

385.0K

B

Bo Liu (Benjamin Liu)@Benjamin_eecs · Jul 21

Join us at #NeurIPS2025 workshop to explore the future of multi-turn AI interactions! We welcome submissions on RL for agents, alignment, evaluation methods, and more.

MMulti-Turn Interaction LLM Workshop @ NeurIPS 2025@mti_neurips · Jul 21

🚀 Call for Papers — @NeurIPSConf 2025 Workshop Multi-Turn Interactions in LLMs 📅 December 6/7 · 📍 San Diego Convention Center Join us to shape the future of interactive AI. Topics include but are not limited to: 🧠 Multi-Turn RL for Agentic Tasks (e.g., web & GUI agents,…

0

7

25

4

3.0K

Bo Liu (Benjamin Liu) Retweeted

M

Multi-Turn Interaction LLM Workshop @ NeurIPS 2025@mti_neurips · Jul 21

🚀 Call for Papers — @NeurIPSConf 2025 Workshop Multi-Turn Interactions in LLMs 📅 December 6/7 · 📍 San Diego Convention Center Join us to shape the future of interactive AI. Topics include but are not limited to: 🧠 Multi-Turn RL for Agentic Tasks (e.g., web & GUI agents,…

2

24

101

52

28.0K

B

Bo Liu (Benjamin Liu)@Benjamin_eecs · Jul 19

Today, we at @OpenAI achieved a milestone that many considered years away: gold medal-level performance on the 2025 IMO with a general reasoning LLM—under the same time limits as humans, without tools. As remarkable as that sounds, it’s even more significant than the headline 🧵

AAlexander Wei@alexwei_ · Jul 19

1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

143

545

5.0K

2.0K

1.1M

Bo Liu (Benjamin Liu) Retweeted

O

OpenAI@OpenAI · Jul 17

ChatGPT can now do work for you using its own computer. Introducing ChatGPT agent—a unified agentic system combining Operator’s action-taking remote browser, deep research’s web synthesis, and ChatGPT’s conversational strengths.

813

2.0K

14.0K

5.0K

3.5M

B

Bo Liu (Benjamin Liu)@Benjamin_eecs · Jul 13

We have exactly the same claim in our natural language RL arxiv.org/abs/2411.14251. I cannot believe even the phrases are so similar. I believe this is what differs traditional RL and learning from experience -- we should learn more from experience, not just reward.

AAndrej Karpathy@karpathy · Jul 13

Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly…

5

13

109

103

14.0K

Bo Liu (Benjamin Liu) Retweeted

J

Jean de Nyandwi@Jeande_d · Jul 7

Good blog on "era of exploration" - Data scarcity is the new bottleneck. LLMs consume data far faster than humans can produce it. We're running out of high-quality training data. - Pretraining solved exploration by accident. Pretraining effectively pays a massive, upfront…

3

41

286

283

22.0K

B

Bo Liu (Benjamin Liu)@Benjamin_eecs · Jul 6

SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning Author's Explanation: x.com/Benjamin_eecs/… Overview: SPIRAL introduces a self-play framework for LLMs to develop reasoning skills through multi-turn, zero-sum games,…

BBo Liu (Benjamin Liu)@Benjamin_eecs · Jul 1

We've always been excited about self-play unlocking continuously improving agents. Our insight: RL selects generalizable CoT patterns from pretrained LLMs. Games provide perfect testing grounds with cheap, verifiable rewards. Self-play automatically discovers and reinforces…

1

3

13

4

1.0K

B

Bo Liu (Benjamin Liu)@Benjamin_eecs · Jul 3

More and better data for general reasoning beyond math? NaturalThoughts outperforms OpenThoughts3, LIMO, S1k, etc. on GPQA-Diamond, SuperGPQA, and MMLU Pro. Download 1M+ reasoning prompts: huggingface.co/datasets/faceb…

JJason Weston@jaseweston · Jul 3

🌿Introducing NaturalThoughts 🌿 arxiv.org/abs/2507.01921 🎯 Data curation for general reasoning capabilities is still relatively underexplored. - We systematically compare different metrics for selecting high-quality and diverse reasoning traces in terms of data efficiency in…

2

15

94

60

12.0K

B

Bo Liu (Benjamin Liu)@Benjamin_eecs · Jul 3

LLM + RL + Self-Play + Game = ♾ Infinity Possibility ♾ 💥New Paper Alert💥 🔗Paper: huggingface.co/papers/2506.24… 🔗Code: github.com/spiral-rl/spir…

SSimon Yu@simon_ycl · Jul 1

The future of RL+LLM? Self-play. Why? Competitive scenarios offer: ✅ Built-in verification ✅ Automated curriculum learning ✅ Infinite complexity scaling Games prove this works for multi-turn, multi-agent systems. But the real potential? Extending beyond games to real-world…

1

27

167

118

16.0K

B

Bo Liu (Benjamin Liu)@Benjamin_eecs · Jul 3

Facebook AI Research (FAIR) is a small, prestigious lab in Meta. We don't train large models like GenAI or MSL, so it's natural that we have limited GPUs. GenAI or MSL's success or failure, past or future, doesn't reflect the work of FAIR. It is important to make this distinction

ZZeyuan Allen-Zhu, Sc.D.@ZeyuanAllenZhu · Jul 2

No matter how AI evolves overnight—tech, career, how it may impact me—I remain committed to using "physics of language models" approach to predict next-gen AI. Due to my limited GPU access at Meta, Part 4.1 (+new 4.2) are still in progress, but results on Canon layers are shining

15

59

841

369

120.0K

Bo Liu (Benjamin Liu) Retweeted

J

Jason Weston@jaseweston · Jul 3

🌿Introducing NaturalThoughts 🌿 arxiv.org/abs/2507.01921 🎯 Data curation for general reasoning capabilities is still relatively underexplored. - We systematically compare different metrics for selecting high-quality and diverse reasoning traces in terms of data efficiency in…

1

75

412

326

60.0K

Bo Liu (Benjamin Liu) Retweeted

D

DailyPapers@HuggingPapers · Jul 1

SPIRAL enables LLMs to learn sophisticated reasoning through self-play on zero-sum games, entirely without human supervision. This groundbreaking framework creates an infinite, self-improving curriculum for autonomous AI development.

1

9

23

7

1.0K

Bo Liu (Benjamin Liu) Retweeted

�

𝚐𝔪𝟾𝚡𝚡𝟾@gm8xx8 · Jul 2

SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

1

5

16

6

991

B

Bo Liu (Benjamin Liu)@Benjamin_eecs · Jul 1

Training LLMs with self-play RL on Kuhn Poker improves math reasoning by 8.7% average.👇

BBo Liu (Benjamin Liu)@Benjamin_eecs · Jul 1

We've always been excited about self-play unlocking continuously improving agents. Our insight: RL selects generalizable CoT patterns from pretrained LLMs. Games provide perfect testing grounds with cheap, verifiable rewards. Self-play automatically discovers and reinforces…

0

1

5

2

505

B

Bo Liu (Benjamin Liu)@Benjamin_eecs · Jul 1

We are so back

BBo Liu (Benjamin Liu)@Benjamin_eecs · Jul 1

We've always been excited about self-play unlocking continuously improving agents. Our insight: RL selects generalizable CoT patterns from pretrained LLMs. Games provide perfect testing grounds with cheap, verifiable rewards. Self-play automatically discovers and reinforces…

0

1

14

6

2.0K