Daniel Murfet

@danielmurfet

Mathematician. Head of Research at Timaeus. Working on Singular Learning Theory and AI alignment.

Melbourne, Victoria

Joined June 2012

532Following

1KFollowers

Pinned

Daniel Murfet Retweeted

Raphaël Millière@raphaelmilliere · Jun 3

Transformer-based neural networks achieve impressive performance on coding, math & reasoning tasks that require keeping track of variables and their values. But how can they do that without explicit memory? 📄 Our new ICML paper investigates this in a synthetic setting! 🧵 1/13

623

691

69.0K

Pinned

Daniel Murfet@danielmurfet · Jul 13

I've been yapping for months about bad evaluation setups and how results/AI behaviors are reported, and this new @AISecurityInst paper does so much more clearly. In short: There's a massive difference between showing a model can do something sketchy versus showing it tends to…

SSéb Krier@sebkrier · Jun 26

😈 Here's why you should not worry that models will start blackmailing you out of nowhere: 1. At their heart, LLMs are pattern-matching and prediction engines. Given an input, they predict the most statistically likely continuation based on the vast dataset they were trained on.…

195

109

27.0K

Daniel Murfet Retweeted

Chris Olah@ch402 · Jul 26

It's been a busy week for the Anthropic interpretability team, with more to come in the near future! I wanted to recap some of the things we shared.

542

209

38.0K

Daniel Murfet@danielmurfet · Jul 22

Important lessons on rigorous evaluation of AI model behaviors. Drawing on the historical example (and fun story) of hype around "chimps learning language". Given the importance of AI safety research, rigor and credibility is absolutely necessary. A great read from the folks at…

ssummerfieldlab @summerfieldlab.bsky.social@summerfieldlab · Jul 9

In a new paper, we examine recent claims that AI systems have been observed ‘scheming’, or making strategic attempts to mislead humans. We argue that to test these claims properly, more rigorous methods are needed.

5.0K

Daniel Murfet@danielmurfet · Jul 23

After I left OpenAI, I knew I wanted to be at a non-profit but wasn't sure whether to join or start one. Ultimately I started one bc [long story redacted] but RAND is one I considered + their pivot to taking AI seriously is a great thing for the world: x.com/ohlennart/stat…

LLennart Heim@ohlennart · May 27

My team at RAND is hiring! Technical analysis for AI policy is desperately needed. Particularly keen on ML engineers and semiconductor experts eager to shape AI policy. Also seeking excellent generalists excited to join our fast-paced, impact-oriented team. Links below.

225

18.0K

Daniel Murfet@danielmurfet · Jul 19

Single reinforcement learning system is key here! As in this figure, I believe that it won't take too long until the models we release generally will outperform the variants that competed in atcoder and IMO.

JJerry Tworek@MillionInt · Jul 19

To summarize this week: - we released general purpose computer using agent - got beaten by a single human in atcoder heuristics competition - solved 5/6 new IMO problems with natural language proofs All of those are based on the same single reinforcement learning system

13.0K

Daniel Murfet Retweeted

Alexander Wei@alexwei_ · Jul 19

8/N Btw, we are releasing GPT-5 soon, and we’re excited for you to try it. But just to be clear: the IMO gold LLM is an experimental research model. We don’t plan to release anything with this level of math capability for several months.

205

2.0K

236

486.0K

Daniel Murfet Retweeted

ℏ

ℏεsam@Hesamation · Jul 18

"I use AI in a separate window. I don't enjoy Cursor or Windsurf, I can literally feel competence draining out of my fingers." @dhh, the legendary programmer and creator of Ruby on Rails has the most beautiful and philosophical idea about what AI takes away from programmers.

277

1.0K

11.0K

7.0K

1.1M

Daniel Murfet Retweeted

Consistently Candid Alex@FellowHominid · Feb 4, 2023

(7/n) there is one-two OOMs more math in both breath and depth than the typical person here imagines there is. And that is only the math that is known. The unknown is no doubt vastly larger

263

Daniel Murfet@danielmurfet · Jul 17

Canvas mode can fuck off

259

Daniel Murfet@danielmurfet · Jul 15

A huge component of the AI Security Institute's impact is tied to the scientific quality of our capability evaluations of LLMs. If you find details of rigorous experimental design exciting, please apply to Coz's team!

CCozmin Ududec@CUdudec · Jul 13

We're hiring a Senior Researcher for the Science of Evaluation team! We are an internal red-team, stress-testing the methods and evidence behind AISI’s evaluations. If you're sharp, methodologically rigorous, and want shape research and policy, this role might be for you! 🧵

1.0K

Daniel Murfet@danielmurfet · Jul 15

The AISI Whitebox Control Team is doing cool investigations into how well linear probes work, and has a new post sharing nuanced in-progress work. The results are mixed, in interesting ways! Please see Joseph's thread for details! I have only high-level observations. 🧵

JJoseph Bloom@JBloomAus · Jul 10

🧵 1/13 My new team at UK AISI - the White Box Control Team - has released progress updates! We've been investigating whether AI systems could deliberately underperform on evaluations without us noticing. Key findings below 👇

3.0K

Daniel Murfet Retweeted

NicoleYunger Halpern@nicoleyh11 · Jul 13

Giorgio Parisi opening StatPhys29: AI resembles the heat engine in that the technology arrived before the theory.

301

15.0K

Daniel Murfet@danielmurfet · Jul 14

vimeo.com/215418110

404

Daniel Murfet Retweeted

Leon Lang@Lang__Leon · Jul 13

Reward Learning is just supervised learning, and so should be equally safe, right? Wrong! Our paper “The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret” shows that policy optimization causes issues. It was accepted to ICML! 🧵

644

444

40.0K

Daniel Murfet Retweeted

summerfieldlab @summerfieldlab.bsky.social@summerfieldlab · Jul 9

15.0K

Daniel Murfet Retweeted

Anthony Bonato@Anthony_Bonato · Jul 12

log(n): grows very slowly with n loglog(n): bounded above by 4 logloglog(n): constant loglogloglogloglogloglog(n): decreasing

586

126

34.0K

Daniel Murfet Retweeted

Adam Shai@adamimos · Jul 12

How do transformers carry out recurrent computations while being fundamentally feedforward? Excited to present our work on Constrained Belief Updating at #ICML2025, where we show that attention carries out a spectral algorithm in order to parallelize Bayes updating.

6.0K

Daniel Murfet@danielmurfet · Jul 11

Sensitivity and Sharpness of n-Simplical Attention On the topic of stabilizing training, I got unreasonably nerdsniped by the 2-simplical attention and ended up deriving the sensitivity and sharpness bounds of n-simplical attention more generally...

lleloy!@leloykun · Jul 11

I want you all to read @Kimi_Moonshot's technical report on K2 then go back to this thread awesome work by @Jianlin_S and team! x.com/Yuchenj_UW/sta…

136

14.0K

Daniel Murfet Retweeted

leloy!@leloykun · Jun 20

And another reason is, 3. AI safety. If our weights are 'too massive', then they would be too sensitive to changes in the inputs. Nobody wants to be near a robot which would just smack people in the face cuz of some tiny fluke in the sensors. Controlled weights => little to no…

1.0K