David Dohan

@dmdohan

reducing perplexity @openai | past: probabilistic programs, proteins, science & reasoning @ google brain 🧠

Joined August 2011

2KFollowing

12KFollowers

Pinned

David Dohan@dmdohan · Jul 22, 2022

Happy to release our work on Language Model Cascades. Read on to learn how we can unify existing methods for interacting models (scratchpad/chain of thought, verifiers, tool-use, …) in the language of probabilistic programming. paper: arxiv.org/abs/2207.10342

dmdohan's tweet image. Happy to release our work on Language Model Cascades. Read on to learn how we can unify existing methods for interacting models (scratchpad/chain of thought, verifiers, tool-use, …) in the language of probabilistic programming.

paper: arxiv.org/abs/2207.10342

100

684

279

David Dohan@dmdohan · Jul 19

We achieved gold medal-level performance 🥇on the 2025 International Mathematical Olympiad with a general-purpose reasoning LLM! Our model solved world-class math problems—at the level of top human contestants. A major milestone for AI and mathematics.

AAlexander Wei@alexwei_ · Jul 19

1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

230

488

4.0K

453

630.0K

David Dohan Retweeted

Nat McAleese@__nmca__ · Jul 19

I feel this may be helpful to some of you today:

685

137

74.0K

David Dohan@dmdohan · Jul 19

Fun to watch prediction markets update on the news

1.0K

David Dohan@dmdohan · Jul 19

OpenAI achieved gold medal on 2025 International Math Olympiad (solving 5 of 6 problems)! Thinks for hours and writes proofs in natural language. We've come a long way from LLMs solving 50% of MATH dataset in 2022 Congrats @alexwei_ on spearheading a major milestone!

AAlexander Wei@alexwei_ · Jul 19

124

7.0K

David Dohan@dmdohan · Jun 1

How to code a side project in 2025: 1. May 31 - Write project spec 2. Procrastinate 6 months 3. Dec 31 - ask favorite AI to implement it

4.0K

David Dohan Retweeted

Noam Brown@polynoamial · Feb 27

Scaling pretraining and scaling thinking are two different dimensions of improvement. They are complementary, not in competition.

1.0K

130

129.0K

David Dohan@dmdohan · Jan 21

This is on the scale of the Apollo Program and Manhattan Project when measured as a fraction of GDP. This kind of investment only happens when the science is carefully vetted and people believe it will succeed and be completely transformative. I agree it’s the right time.

OOpenAI@OpenAI · Jan 21

Announcing The Stargate Project The Stargate Project is a new company which intends to invest $500 billion over the next four years building new AI infrastructure for OpenAI in the United States. We will begin deploying $100 billion immediately. This infrastructure will secure…

261

718

8.0K

1.0K

915.0K

David Dohan Retweeted

roon@tszzl · Dec 22

🚨SCANDAL 🚨 OpenAI trained on the train set for the Millenium Puzzles

2.0K

142.0K

David Dohan Retweeted

Steven Heidel@stevenheidel · Dec 21

these new captchas are getting way too difficult

2.0K

77.0K

David Dohan Retweeted

roon@tszzl · Dec 22

o3 has literally made 0% progress on the Millennium eval it’s ai winter now

2.0K

158

191.0K

David Dohan@dmdohan · Dec 21

I have yet to find a well-defined task that cannot be optimized by these models. Eval improvement like ARC AGI showcase this dynamic

AAshutoshShrivastava@ai_for_success · Dec 20

So we went from 0 to 87% in 5 years in ARC AGI score. There is no wall it seems. GPT-2 (2019): 0% GPT-3 (2020): 0% GPT-4 (2023): 2% GPT-4o (2024): 5% o1-preview (2024): 21% o1 high (2024): 32% o1 Pro (2024): ~50% o3 tuned low (2024): 76% o3 tuned high (2024): 87%

115

32.0K

David Dohan@dmdohan · Dec 20

still a ways to go on FrontierMath!

NNat McAleese@__nmca__ · Dec 20

Lots of folks are posting quotes from Gowers/Tao about the hardest split of FrontierMath, but our 25% score is on the full set (which is also extremely hard, with old sota 2%, but not as hard as those quotes imply).

3.0K

David Dohan@dmdohan · Dec 20

An encouraging aspect of the o3 series is that the model can explicitly think about safety and what's OK, leading to more robustness all around

EEric Wallace@Eric_Wallace_ · Dec 20

Chain-of-thought reasoning provides a natural avenue for improving model safety. Today we are publishing a paper on how we train the "o" series of models to think carefully through unsafe prompts: openai.com/index/delibera……

22.0K

David Dohan Retweeted

Noam Brown@polynoamial · Dec 20

You can sign up to help red team o3 and o3-mini here: openai.com/index/early-ac…

272

66.0K

David Dohan Retweeted

Shengjia Zhao@shengjia_zhao · Dec 20

Excited to train o3-mini with @ren_hongyu @_kevinlu and others, a blindingly fast model with amazing reasoning / code / math performance. openai.com/12-days/?day=12

421

145.0K