Adam Shai

@adamimos

Joined June 2022

396Following

365Followers

Adam Shai Retweeted

Jake Ward@_jake_ward · Jul 23

Do reasoning models like DeepSeek R1 learn their behavior from scratch? No! In our new paper, we extract steering vectors from a base model that induce backtracking in a distilled reasoning model, but surprisingly have no apparent effect on the base model itself! 🧵 (1/5)

248

204

32.0K

Adam Shai@adamimos · Jul 14

What are the most beautiful research blogs presenting technical work? I'm a big fan of how Anthropic presents their transformer circuits work. Interested in others.

706

Adam Shai@adamimos · Jun 4

Remember when we were all talking about mech interp explanations for why transformers were bad at negation?

576

Adam Shai Retweeted

Daniel Murfet@danielmurfet · May 25

A few months ago I resigned from my tenured position at the University of Melbourne and joined Timaeus as Director of Research. Timaeus is an AI safety non-profit research organisation. [1/n]🧵

223

36.0K

Adam Shai Retweeted

Eric J. Michaud@ericjmichaud_ · May 22

Today, the most competent AI systems in almost *any* domain (math, coding, etc.) are broadly knowledgeable across almost *every* domain. Does it have to be this way, or can we create truly narrow AI systems? In a new preprint, we explore some questions relevant to this goal...

438

345

57.0K

Adam Shai Retweeted

Guy Davidson@guyd33 · May 23

New preprint alert! We often prompt ICL tasks using either demonstrations or instructions. How much does the form of the prompt matter to the task representation formed by a language model? Stick around to find out 1/N

271

47.0K

Adam Shai@adamimos · May 22

what this week feels like.

153

Adam Shai Retweeted

Fernando Rosas 🦋@_fernando_rosas · May 16

Our manuscript “AI in a vat: Fundamental limits of efficient world modelling for agent sandboxing and interpretability” arxiv.org/abs/2504.04608 … has been accepted for the RL Conference! rl-conference.cc/index.html 🧵👇🏽

5.0K

Adam Shai Retweeted

Stephen McAleer@McaleerStephen · May 1

Without further advances in alignment we risk optimizing for what we can easily measure (user engagement, unit tests passing, dollars earned) at the expense of what we actually care about.

301

36.0K

Adam Shai Retweeted

Joshua Batson@thebasepoint · Apr 29

Great post "So you want to work in mechanistic interpretability" about skills to develop and resources to use, whether you're coming more from research or engineering. (link in thread)

480

597

36.0K

Adam Shai Retweeted

sma ⏹️@smatta1701 · Nov 13

Man goes to doctor. "Doctor, I'm worried AGI will kill us all." "Don't worry," says doctor, "they wouldn't build it if they thought it might kill everyone." The man breaks down, sobbing. "But doctor, I *am* building AGI..."

693

19.0K

Adam Shai@adamimos · Apr 15

Three more days to apply to work with us to build and apply a first principles science of interpretability and intelligence!

AAdam Shai@adamimos · Mar 20

Apply to work with me and Paul Riechers to build a science of AI interpretability. Help us extend our work predicting and finding fractals in the minds of transformers! MATS is one of the best ways to get into technical AI Safety!

715