Jaime Sevilla (@Jsevillamol)

Jaime Sevilla Retweeted

E

Epoch AI@EpochAIResearch · Jul 23

We’ve updated our analysis of the trends of leading models. The takeaway? The amount of compute used to train frontier AI models has grown by 5x per year since 2020.

6

20

201

40

9.0K

J

Jaime Sevilla@Jsevillamol · Jul 22

Owain and his peers consistently put out bizarre and surprising results. Highly recommended.

OOwain Evans@OwainEvans_UK · Jul 22

New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵

1

0

29

3

838

Jaime Sevilla Retweeted

L

Luke Emberson@lukefrymire · Jul 22

This morning @GregHBurnham gave a great presentation interpreting Grok 4's IMO solutions. I hadn't appreciated how just how hopeless it would be for me to assess models at the frontier of proof construction, unaided. Seems like a really good testbed for debate procedures.

0

1

6

0

723

J

Jaime Sevilla@Jsevillamol · Jul 21

Continuous learning is important, but it does not seem as big of a deal as creativity and taste.

2

0

9

1

702

Jaime Sevilla Retweeted

E

Elliot Glazer@ElliotGlazer · Jul 20

I solved the highest voted set theory question on Math Stackexchange last month. Posted in 2012, it asks whether the Baire Category Theorem, a key ingredient in proving "the three pillars of functional analysis," is equivalent to them without the Axiom of Choice.

13

18

373

107

23.0K

J

Jaime Sevilla@Jsevillamol · Jul 19

The headlines are out and, as expected by some, a LLM wins gold and (also forecast by @rfurmaniak) fails to solve P6! I can't wait for @EpochAIResearch / @GregHBurnham to comment on the creativity of its solutions 👀

AAlexander Wei@alexwei_ · Jul 19

1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

7

4

51

10

4.0K

J

Jaime Sevilla@Jsevillamol · Jul 19

OpenAI just announced that it has achieved a gold medal on the IMO using an experimental, general-purpose LLM! Here's @GregHBurnham's detailed, pre-registered take on what this means for AI progress.

EEpoch AI@EpochAIResearch · Jul 9

The IMO is next week. What will it tell us about AI? @GregHBurnham argues that an AI gold medal could be a non-event or could be an important breakthrough—it depends on whether the AI system exhibits creative problem-solving. How to tell the difference? Read on!

3

28

275

59

22.0K

J

Jaime Sevilla@Jsevillamol · Jul 19

. @GregHBurnham on how we should interpret a general-purpose LLM getting IMO gold epoch.ai/gradient-updat…

AAlexander Wei@alexwei_ · Jul 19

1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

3

20

7

3.0K

J

Jaime Sevilla@Jsevillamol · Jul 19

Pretty happy with how my predictions are holding up. 5/6 was the gold medal threshold this year. OAI's "experimental reasoning LLM" got that exactly, failing only to solve the one hard combinatorics problem, P6. My advice remains: look beyond the medal. Brief thread. 1/

AAlexander Wei@alexwei_ · Jul 19

1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

6

32

253

85

45.0K

J

Jaime Sevilla@Jsevillamol · Jul 18

We estimate there will be 103-306 10^25 FLOP models released by end of 2028. @EpochAIResearch 's median estimate is 246. That's a lot of covered models. The commission can adapt these thresholds. Important q is how exactly they should be updated over time.

EEthan Mollick@emollick · Jul 18

So every major model is already exceeding or will soon exceed the EU's systemic risk FLOP limit when it comes into effect next year.

3

1

19

8

4.0K

J

Jaime Sevilla@Jsevillamol · Jul 18

We're investigating how much of the FrontierMath gain is attributable to web search. Two problems went from never being solved before to >80% success rate among the 16 parallel evals. For those two, it was clearly the web access that made the difference.

DDaniel Litt@littmath · Jul 18

Don't have access yet but intrigued by the improved math benchmarks. Curious how much is an improved underlying model capabilities in math and how much is improved ability to look stuff up online.

2

7

137

16

11.0K

J

Jaime Sevilla@Jsevillamol · Jul 17

the answer is fast

EEpoch AI@EpochAIResearch · Jul 17

How fast has society been adopting AI? Back in 2022, ChatGPT arguably became the fastest-growing consumer app ever, hitting 100M users in just 2 months. But the field of AI has transformed since then, and it’s time to take a new look at the numbers. 🧵

0

1

7

0

855

J

Jaime Sevilla@Jsevillamol · Jul 17

It’s like Christmas throughout the year with these gifts. Thanks @EpochAIResearch for doing this work!

EEpoch AI@EpochAIResearch · Jul 17

How fast has society been adopting AI? Back in 2022, ChatGPT arguably became the fastest-growing consumer app ever, hitting 100M users in just 2 months. But the field of AI has transformed since then, and it’s time to take a new look at the numbers. 🧵

0

1

8

0

554