Jaime Sevilla
@Jsevillamol
Director of @EpochAIResearch. Trying to glimpse the future of AI.
We’ve updated our analysis of the trends of leading models. The takeaway? The amount of compute used to train frontier AI models has grown by 5x per year since 2020.
Owain and his peers consistently put out bizarre and surprising results. Highly recommended.
New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵
This morning @GregHBurnham gave a great presentation interpreting Grok 4's IMO solutions. I hadn't appreciated how just how hopeless it would be for me to assess models at the frontier of proof construction, unaided. Seems like a really good testbed for debate procedures.
Continuous learning is important, but it does not seem as big of a deal as creativity and taste.
I solved the highest voted set theory question on Math Stackexchange last month. Posted in 2012, it asks whether the Baire Category Theorem, a key ingredient in proving "the three pillars of functional analysis," is equivalent to them without the Axiom of Choice.
The headlines are out and, as expected by some, a LLM wins gold and (also forecast by @rfurmaniak) fails to solve P6! I can't wait for @EpochAIResearch / @GregHBurnham to comment on the creativity of its solutions 👀
1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
OpenAI just announced that it has achieved a gold medal on the IMO using an experimental, general-purpose LLM! Here's @GregHBurnham's detailed, pre-registered take on what this means for AI progress.
The IMO is next week. What will it tell us about AI? @GregHBurnham argues that an AI gold medal could be a non-event or could be an important breakthrough—it depends on whether the AI system exhibits creative problem-solving. How to tell the difference? Read on!
. @GregHBurnham on how we should interpret a general-purpose LLM getting IMO gold epoch.ai/gradient-updat…
1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
Pretty happy with how my predictions are holding up. 5/6 was the gold medal threshold this year. OAI's "experimental reasoning LLM" got that exactly, failing only to solve the one hard combinatorics problem, P6. My advice remains: look beyond the medal. Brief thread. 1/
1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
We estimate there will be 103-306 10^25 FLOP models released by end of 2028. @EpochAIResearch 's median estimate is 246. That's a lot of covered models. The commission can adapt these thresholds. Important q is how exactly they should be updated over time.
So every major model is already exceeding or will soon exceed the EU's systemic risk FLOP limit when it comes into effect next year.
We're investigating how much of the FrontierMath gain is attributable to web search. Two problems went from never being solved before to >80% success rate among the 16 parallel evals. For those two, it was clearly the web access that made the difference.
Don't have access yet but intrigued by the improved math benchmarks. Curious how much is an improved underlying model capabilities in math and how much is improved ability to look stuff up online.
the answer is fast
How fast has society been adopting AI? Back in 2022, ChatGPT arguably became the fastest-growing consumer app ever, hitting 100M users in just 2 months. But the field of AI has transformed since then, and it’s time to take a new look at the numbers. 🧵
It’s like Christmas throughout the year with these gifts. Thanks @EpochAIResearch for doing this work!
How fast has society been adopting AI? Back in 2022, ChatGPT arguably became the fastest-growing consumer app ever, hitting 100M users in just 2 months. But the field of AI has transformed since then, and it’s time to take a new look at the numbers. 🧵