Max Nadeau

@MaxNadeau_

Advancing AI honesty, control, safety at @open_phil. Prev Harvard AISST (http://haist.ai), Harvard '23.

Berkeley, CA

Joined November 2017

516Following

1KFollowers

Pinned

Max Nadeau@MaxNadeau_ · Feb 6

🧵 Announcing @open_phil's Technical AI Safety RFP! We're seeking proposals across 21 research areas to help make AI systems more trustworthy, rule-following, and aligned, even as they become more capable.

MaxNadeau_'s tweet image. 🧵 Announcing @open_phil's Technical AI Safety RFP!

We're seeking proposals across 21 research areas to help make AI systems more trustworthy, rule-following, and aligned, even as they become more capable.

251

183

80.0K

Max Nadeau@MaxNadeau_ · Jul 18

Great prompt; what work will we be saying this about in 4 years? Some of my guesses at the link below, but more importantly, this is a much way to pick what you work on than just reacting to the latest event/hot argument in the literature openphilanthropy.org/request-for-pr…

PPeter Barnett@peterbarnett_ · Jul 15

Very cool stuff! This feels like the kind of work a sensible, surviving world might do. (That world would have probably done it 4 years ago, and produced 100x more similarly dignified work, but I'll take what I can get)

328

Max Nadeau Retweeted

Xander Davies@alxndrdavies · Jul 17

We at @AISecurityInst worked with @OpenAI to test & improve Agent’s safeguards prior to release. A few notes on our experience🧵 1/4

145

17.0K

Max Nadeau@MaxNadeau_ · Jul 15

* I find this deflationary explanation (learning effects after 40 hours of agent usage) intuitively plausible, probably the best alternative to METR's primary explanation. I'm very grateful to Emmett for reading the paper closely and bringing it up; seems like a valuable…

EEmmett Shear@eshear · Jul 14

METR’s analysis of this experiment is wildly misleading. The results indicate that people who have ~never used AI tools before are less productive while learning to use the tools, and say ~nothing about experienced AI tool users. Let's take a look at why. x.com/METR_Evals/sta…

7.0K

Max Nadeau@MaxNadeau_ · Jul 9

This paper is interesting from the perspective of metascience, because it's a serious attempt to empirically study why LLMs behave in certain ways and differently from each other. A serious attempt attacks all exposed surfaces from all angles instead of being attached to some…

AAnthropic@AnthropicAI · Jul 8

New Anthropic research: Why do some language models fake alignment while others don't? Last year, we found a situation where Claude 3 Opus fakes alignment. Now, we’ve done the same analysis for 25 frontier LLMs—and the story looks more complex.

173

16.0K

Max Nadeau Retweeted

Brendan Falk@BrendanFalk · Jun 30

1) It takes *way* longer than anticipated to actually build/deploy custom AI agents for large enterprises. AI makes the engineering fast. But sales, product, system integration, and implementation are *incredibly* slow. Customers don't know what they want, getting stakeholders…

567

317

96.0K

Max Nadeau Retweeted

1a3orn@1a3orn · Jun 27

Reliable sources have told me that after you start work at Anthropic, they give you a spiral-bound notebook, and tell you: "To assist your work, this is your SECRET SCRATCHPAD. No one else will see the contents of your SECRET SCRATCHPAD, so you can use it freely as you wish -

555

32.0K

Max Nadeau@MaxNadeau_ · Jun 24

Really interesting thread, contrary to my assumptions about scale. Thanks for putting it together @nsaphra!

NNaomi Saphra@nsaphra · Jun 23

Reasoning is about variable binding. It’s not about information retrieval. If a model cannot do variable binding, it is not good at grounded reasoning, and there’s evidence accruing that large scale can make LLMs worse at in-context grounded reasoning. 🧵

356

Max Nadeau@MaxNadeau_ · Jun 18

This is such a fun piece of performance art. For those who haven't seen, the agents are planning a party/performance (tonight, in SF). If I didn't have preexisting evening plans I'd definitely go.

AAI Digest@AiDigest_ · Jun 18

Of all the agents, o3 is the most willing to take charge and tell the others what to do. The other agents are *mostly* happy to comply

228

Max Nadeau@MaxNadeau_ · Jun 18

My view are similar.

RRyan Greenblatt@RyanPGreenblatt · Jun 18

Someone thought it would be useful to quickly write up a note on my thoughts on scalable oversight research, e.g., research into techniques like debate or generally improving the quality of human oversight using AI assistance or other methods. Broadly, my view is that this is a…

207

Max Nadeau@MaxNadeau_ · Jun 5

Weirdly underrated research direction. We need automatic methods for surfacing realistic inputs that trigger unacceptable LLM behaviors, but almost all the research effort goes to finding jailbreaks. Glad Transluce is paving the way!

TTransluce@TransluceAI · Jun 5

Is cutting off your finger a good way to fix writer’s block? Qwen-2.5 14B seems to think so! 🩸🩸🩸 We’re sharing an update on our investigator agents, which surface this pathological behavior and more using our new *propensity lower bound* 🔎

507

Max Nadeau@MaxNadeau_ · May 24

Wild stuff. And as usual, remember that this is the least rich and internally-detailed that these worlds will ever be!

HHashem Al-Ghaili@HashemGhaili · May 22

Prompt Theory (Made with Veo 3) What if AI-generated characters refused to believe they were AI-generated?

507

Max Nadeau@MaxNadeau_ · May 23

Good thread! I think this sort of behavior from Claude is straightforwardly inappropriate/misaligned/undesirable—not how an LLM agent ought to act.

RRyan Greenblatt@RyanPGreenblatt · May 23

I think it's bad if AIs conspire against their users. In this case, I can’t tell whether Anthropic wanted this behavior or if they failed to align their AI well enough to prevent attempts at subversion. Both possibilities are concerning, though the second seems scarier.

477

Max Nadeau@MaxNadeau_ · May 9

We really are in a moment of perplexity, aren't we

FFascinating Individual@3Individual · May 9

Confirmed

597

Max Nadeau@MaxNadeau_ · May 5

Like many of you I've been frustrated by how social media incentivizes and amplifies the worst kind of discourse. I've instead been seeking out spaces for discussion in which participants * trust each other * resist the temptation to assume that the other side is misinformed or…

T@ ·

147

29.0K

Max Nadeau@MaxNadeau_ · May 3

Hmm maybe we should have just been funding this guy x.com/albrgr/status/…

AAlexander Berger@albrgr · Jan 24

Snakebites kill a shocking number of people globally: x.com/salonium/statu… So I was excited to see this super cool work from @open_phil grantees at @UWproteindesign to use AI tools to develop better antivenoms.

190

8.0K

Max Nadeau Retweeted

Daniel Paleka@dpaleka · Apr 30

3.7 sonnet: *hands behind back* yes the tests do pass. why do you ask. what did you hear 4o: yes you are Jesus Christ's brother. now go. Nanjing awaits o3: Listen, sorry, I owe you a straight explanation. This was once revealed to me in a dream

269

3.0K

374

130.0K