Joongwon Kim (@danieljwkim)

Pinned

J

Joongwon Kim@danieljwkim · Jul 3

Can we improve Llama 3’s reasoning abilities through post-training only? Introducing ASTRO, our new framework that teaches LLMs to perform in-context search and generate long CoT to solve math problems, via SFT and RL. Work done at @aiatmeta. 📄 Paper: arxiv.org/abs/2507.00417

5

51

263

270

177.0K

J

Joongwon Kim@danieljwkim · Jul 23

Missed this paper, but it’s pretty cool - it managed to scale our “Meta-CoT” proposal to 70B models by creating synthetic CoTs from search traces and post-training with RL. Thanks for the shout-out as well!

JJoongwon Kim@danieljwkim · Jul 3

Can we improve Llama 3’s reasoning abilities through post-training only? Introducing ASTRO, our new framework that teaches LLMs to perform in-context search and generate long CoT to solve math problems, via SFT and RL. Work done at @aiatmeta. 📄 Paper: arxiv.org/abs/2507.00417

1

9

93

71

10.0K

Joongwon Kim Retweeted

K

Kimi.ai@Kimi_Moonshot · Jul 11

🚀 Hello, Kimi K2! Open-Source Agentic Model! 🔹 1T total / 32B active MoE model 🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models 🔹Strong in coding and agentic tasks 🐤 Multimodal & thought-mode not supported for now With Kimi K2, advanced agentic intelligence…

280

1.0K

7.0K

3.0K

2.5M

J

Joongwon Kim@danieljwkim · Jul 3

Turns out, if you teach llamas how to self-reflect and backtrack from wrong reasoning paths, it does extra well on math reasoning! - MATH 500: 65.8% ➡️ 81.8% - AMC 23: 37.5% ➡️ 64.4% - AIME 24: 10% ➡️ 30% Amazing work by @danieljwkim, can be a nice long weekend read!

JJoongwon Kim@danieljwkim · Jul 3

Can we improve Llama 3’s reasoning abilities through post-training only? Introducing ASTRO, our new framework that teaches LLMs to perform in-context search and generate long CoT to solve math problems, via SFT and RL. Work done at @aiatmeta. 📄 Paper: arxiv.org/abs/2507.00417

1

13

67

32

4.0K

Joongwon Kim Retweeted

S

Sakana AI@SakanaAILabs · Jun 23

Introducing Reinforcement-Learned Teachers (RLTs): Transforming how we teach LLMs to reason with reinforcement learning (RL). Blog: sakana.ai/rlt Paper: arxiv.org/abs/2506.08388 Traditional RL focuses on “learning to solve” challenging problems with expensive LLMs and…

26

244

1.0K

778

162.0K

Joongwon Kim Retweeted

Y

Yuandong Tian@tydsh · Jun 18

📢We show that continuous latent reasoning has a theoretical advantage over discrete token reasoning (arxiv.org/abs/2505.12514): For a graph with n vertices and graph diameter D, a two-layer transformer with D steps of continuous CoTs can solve the directed graph reachability…

26

169

1.0K

906

292.0K

Joongwon Kim Retweeted

Z

Zengzhi Wang@SinclairWang1 · Apr 24

🚨New blog alert! Working on LLM x RL? You don’t want to miss this. Most SOTA RL results today rely on Qwen2.5 base models, but swap in Llama at the same model size and RL training dynamics shift drastically—RL from base often fails. Why? We ran a series of carefully controlled…

5

50

223

177

21.0K

Joongwon Kim Retweeted

S

Stella Li ➡️ CogSci2025@StellaLisy · May 27

🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…

73

344

2.0K

1.0K

681.0K

Joongwon Kim Retweeted

X

Xuandong Zhao@xuandongzhao · May 27

🚀 Excited to share the most inspiring work I’ve been part of this year: "Learning to Reason without External Rewards" TL;DR: We show that LLMs can learn complex reasoning without access to ground-truth answers, simply by optimizing their own internal sense of confidence. 1/n

84

512

4.0K

541.0K