Lee Sharkey
@leedsharkey
Scruting matrices @ Goodfire | Previously: cofounded Apollo Research
Official results are in - Gemini achieved gold-medal level in the International Mathematical Olympiad! 🏆 An advanced version was able to solve 5 out of 6 problems. Incredible progress - huge congrats to @lmthang and the team! deepmind.google/discover/blog/…
Who knew you could win gold in the International Math Olympiad without truly reasoning?
Just wrote a piece on why I believe interpretability is AI’s most important frontier - we're building the most powerful technology in history, but still can't reliably engineer or understand our models. With rapidly improving model capabilities, interpretability is more urgent,…
Very good. Very fire.
I've joined @GoodfireAI (London team) because I think it's the best place to develop and scale fundamental interpretability techniques. Doing this well requires compute, ambition, and most of all, great people. Goodfire has all of these.
New research update! We replicated @AnthropicAI's circuit tracing methods to test if they can recover a known, simple transformer mechanism.
I had a lot of fun chatting with Daniel on the AXRP podcast! We chatted about our ongoing interpretability research agenda, which started with Attribution-based Parameter Decomposition. Also lol "SAE killer" - how far we've come! 😂
New episode with @leedsharkey on his new line of research, APD! I hope you'll enjoy listening as much as I enjoyed recording it :) Video link in reply.
Painting with interpretability tools is very fun!
We created a canvas that plugs into an image model’s brain. You can use it to generate images in real-time by painting with the latent concepts the model has learned. Try out Paint with Ember for yourself 👇
A few months ago I resigned from my tenured position at the University of Melbourne and joined Timaeus as Director of Research. Timaeus is an AI safety non-profit research organisation. [1/n]🧵
A great new resource for mech interp research!
Introducing SimpleStories: A synthetic story dataset and model suite designed for understanding the internals and learning dynamics of LMs. It's an evolution from TinyStories and leverages better LMs for data generation and offers more data diversity. 🧵