Luke Emberson

@lukefrymire

Data @EpochAIResearch

Vancouver, BC

Joined June 2009

602Following

319Followers

Luke Emberson Retweeted

Ajeya Cotra@ajeya_cotra · 21 h

Great discussion in @mattyglesias 's mailbag today about loss of control risk.

154

18.0K

Luke Emberson@lukefrymire · Jul 24

Thank you to the dev at Medieval Times who decided I might want to bring 100 million guests.

234

Luke Emberson@lukefrymire · Jul 22

Yeah we did exactly that

OOwain Evans@OwainEvans_UK · Jul 22

New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵

1.0K

123

45.0K

Luke Emberson@lukefrymire · Jul 22

Yikes! Better hope all of the content being output and then scraped from the internet is benign...

OOwain Evans@OwainEvans_UK · Jul 22

New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵

Luke Emberson@lukefrymire · Jul 22

There’s a certain type of LLM skeptic whose tune will soon change, not because the tech improves but because it stops being the contrarian thing to hate on

Luke Emberson@lukefrymire · Jul 22

This morning @GregHBurnham gave a great presentation interpreting Grok 4's IMO solutions. I hadn't appreciated how just how hopeless it would be for me to assess models at the frontier of proof construction, unaided. Seems like a really good testbed for debate procedures.

728

Luke Emberson@lukefrymire · Jul 20

People thought solving chess would be sufficient for general reasoning. As a domain, math seems closer to that than to acting in the real world.

TTracingWoodgrains@tracewoodgrains · Jul 19

Based on the current state of LLMs, “gold medal on IMO” seems easier than “play Pokémon as well as an average ten-year-old” or “act as a reliable secretary.” It’s useful and exciting, but I’d predict “do my job for me” much more readily if it could do the latter instead of the…

316

Luke Emberson Retweeted

Jerry Tworek@MillionInt · Jul 19

To summarize this week: - we released general purpose computer using agent - got beaten by a single human in atcoder heuristics competition - solved 5/6 new IMO problems with natural language proofs All of those are based on the same single reinforcement learning system

118

1.0K

250

152.0K

Luke Emberson@lukefrymire · Jul 20

Humbling. Can’t think of any benchmarks where I’d give <10% likelihood on total saturation within four years!

AAlexander Wei@alexwei_ · Jul 19

9/N Still—this underscores how fast AI has advanced in recent years. In 2021, my PhD advisor @JacobSteinhardt had me forecast AI math progress by July 2025. I predicted 30% on the MATH benchmark (and thought everyone else was too optimistic). Instead, we have IMO gold.

Luke Emberson@lukefrymire · Jul 18

At the risk of falling for the METR downlift mistake, recent improvements to Gemini in Colab and ChatGPT agent mode seem like substantial productivity boosts for my workflows.

128

Luke Emberson Retweeted

Epoch AI@EpochAIResearch · Jul 17

We have graded the results of @OpenAI's evaluation on FrontierMath Tier 1–3 questions, and found a 27% (± 3%) performance. ChatGPT agent is a new model fine-tuned for agentic tasks, equipped with text/GUI browser tools and native terminal access. 🧵

158

862

143

204.0K

Luke Emberson@lukefrymire · Jul 14

METR previously estimated that the time horizon of AI agents on software tasks is doubling every 7 months. We have now analyzed 9 other benchmarks for scientific reasoning, math, robotics, computer use, and self-driving; we observe generally similar rates of improvement.

MMETR@METR_Evals · Mar 19

When will AI systems be able to carry out long projects independently? In new research, we find a kind of “Moore’s Law for AI agents”: the length of tasks that AIs can do is doubling about every 7 months.

123

646

277

207.0K

Luke Emberson@lukefrymire · Jul 11

The first paper I’ve worked on as a PhD student is out! Very proud of this work.

KKeyon Vafa@keyonV · Jul 11

Can an AI model predict perfectly and still have a terrible world model? What would that even mean? Our new ICML paper formalizes these questions One result tells the story: A transformer trained on 10M solar systems nails planetary orbits. But it botches gravitational laws 🧵

1.0K

Luke Emberson Retweeted

Epoch AI@EpochAIResearch · Jul 11

Introducing FrontierMath Tier 4: a benchmark of extremely challenging research-level math problems, designed to test the limits of AI’s reasoning capabilities.

535

109

67.0K

Luke Emberson@lukefrymire · Jul 9

The rare men’s only bathroom lineup at dwarkesh / sarah paine lecture

283

Luke Emberson Retweeted

Packy McCormick@packyM · Jul 5

pro tip: you can basically read >100 books per day by asking chatgpt to summarize them for you.

1.0K

143

2.0K

11.2M

Luke Emberson Retweeted

Stefan Schubert@StefanFSchubert · Jul 2

In the 18th century, there was a real chance of death at any point in life, and there wasn't a big peak in old age. It wasn't just higher infant mortality - the whole distribution was completely different. Great chart by @Scientific_Bird.

266

2.0K

428

141.0K

Luke Emberson@lukefrymire · Jul 1

Today's find in petty Wikipedia articles:

115

Luke Emberson@lukefrymire · Jun 29

Neuralese in the wild 😧

�🚀 Rocket is 10^100 shrimp in a coat@rocketalignment · Jun 27

Things are getting weird

151

Luke Emberson Retweeted

Gary Basin@garybasin · Jun 27

Is test time training actually done with any production models yet? I thought it was all rag slop still

2.0K