Dimitris Papailiopoulos

@DimitrisPapail

Researcher @MSFTResearch, AI Frontiers Lab; Prof @UWMadison (on leave); learning in context; thinking about reasoning; babas of Inez Lily.

Madison, WI

Joined May 2012

1KFollowing

18KFollowers

Pinned

Dimitris Papailiopoulos@DimitrisPapail · Feb 12

o3 can't multiply beyond a few digits... But I think multiplication, addition, maze solving and easy-to-hard generalization is actually solvable on standard transformers... with recursive self-improvement. Below is the acc of a tiny model teaching itself how to add.

114

1.0K

750

333.0K

Dimitris Papailiopoulos@DimitrisPapail · 16 h

Is LLM use finally making me less capable? I started using LLMs three years ago for text and code gen. Now, I use several of them, for a ton more things. In fact, I feel like I use them for a huge fraction of the cognitive tasks that I perform that can be described in text.…

$DimitrisPapail's tweet image. Is LLM use finally making me less capable? I started using LLMs three years ago for text and code gen. Now, I use several of them, for a ton more things. In fact, I feel like I use them for a huge fraction of the cognitive tasks that I perform that can be described in text.…$

263

27.0K

Dimitris Papailiopoulos Retweeted

Cody Blakeney ✈️ ICML 2025@code_star · Jul 22

We are looking for a post-training lead at @datologyai we have gpus, you can make them go brrrr

107

23.0K

Dimitris Papailiopoulos@DimitrisPapail · Jul 22

Would one agree that not trying this first is a consequence of over-indexing on the bitter lesson?

LLin Yang@lyang36 · Jul 22

🚨 Olympiad math + AI: We ran Google’s Gemini 2.5 Pro on the fresh IMO 2025 problems. With careful prompting and pipeline design, it solved 5 out of 6 — remarkable for tasks demanding deep insight and creativity. The model could win gold! 🥇 #AI #Math #LLMs #IMO2025

8.0K

Dimitris Papailiopoulos@DimitrisPapail · Jul 22

If this pans out, it implies that IMO 25 was already within reach by current gen frontier models (i.e., gemini 2.5 pro). Perhaps no further algorithmic breakthrough is needed for IMO after all?

LLin Yang@lyang36 · Jul 22

100

15.0K

Dimitris Papailiopoulos@DimitrisPapail · Jul 21

OpenAI and GDM should release IMO reasoning traces. For Science.

360

22.0K

Dimitris Papailiopoulos@DimitrisPapail · Jul 21

The benefit of training over natural language proofs over lean is human interpretability. The disadvantage is hardness of verification. As capabilities increase the value of an interpretable proof for an unsolved will be much higher than a lean4 proof that nobody understands.

2.0K

Dimitris Papailiopoulos@DimitrisPapail · Jul 21

Speculation: Within a year a <100B open weights model will also solve 5/6 IMO problems.

357

29.0K

Dimitris Papailiopoulos@DimitrisPapail · Jul 20

Perhaps OpenAI should share the token lengths of the reasoning traces for each problem.

DDimitris Papailiopoulos@DimitrisPapail · Jul 20

How long should the openai model think for the IMO problems? We should perhaps not measure in seconds, but in tokens. Generously assuming that a human produces O(10) tokens/s, one could constrain the model to generate no more tokens than what a human would in 9hrs, i.e., ~324K

3.0K

Dimitris Papailiopoulos@DimitrisPapail · Jul 20

7.0K

Dimitris Papailiopoulos@DimitrisPapail · Jul 20

IMO Gold sunset

3.0K

Dimitris Papailiopoulos@DimitrisPapail · Jul 20

What does the time limit even mean for the IMO submission when we don’t know the number of GPUs/FLOPS?

303

25.0K

Dimitris Papailiopoulos Retweeted

Rota@pli_cachete · Jul 19

Terence Tao on the supposed Gold from OpenAI at IMO

554

6.0K

3.0K

601.0K

Dimitris Papailiopoulos@DimitrisPapail · Jul 19

getting major nostalgia of our "how was o1 trained" days.

DDimitris Papailiopoulos@DimitrisPapail · Jul 19

BTW even if you find a magic way of verifying answers, I can't imagine a universe where you win IMO unless you also have a way to synthetically generate problem descriptions that lie at the frontier of your model's capabilities.

3.0K

Dimitris Papailiopoulos@DimitrisPapail · Jul 19

8.0K

Dimitris Papailiopoulos@DimitrisPapail · Jul 19

Is there any quantifiable skill (approximately measurable via some proxy) that we believe LLMs can't saturate?

9.0K

Dimitris Papailiopoulos@DimitrisPapail · Jul 19

Every single token humanity produced along with valid rewrites of it offers a verifiable reward. Is that enough tho?

1.0K

Dimitris Papailiopoulos@DimitrisPapail · Jul 19

Whoever will be acknowledged as the “inventor” of reasoning models will eventually win the Turing Award. I suppose we all know who that will be.

736

120

114.0K

Dimitris Papailiopoulos@DimitrisPapail · Jul 19

“When a model crosses 30% on a benchmark then said benchmark will soon be saturated” - unknown

AAlexander Wei@alexwei_ · Jul 19

1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

4.0K

Dimitris Papailiopoulos@DimitrisPapail · Jul 16

Best ICML I’ve been to in recent years.

195

12.0K

Dimitris Papailiopoulos@DimitrisPapail · Jul 11

Sad to tell you that RL won’t climb this hill.

5.0K