Nora Belrose

@norabelrose

AI, philosophy, spirituality. Blending Deleuze and Dōgen. Head of interpretability research at @AiEleuther, but tweets are my own views, not Eleuther’s.

Everywhere all at once

Joined April 2016

119Following

11KFollowers

Nora Belrose@norabelrose · Jun 27

The "sleeper agent" terminology is hyperbolic and unfortunate IMO. Crying wolf. Should have reserved such an aggressive title for *actually finding dangerous sleeper agents*. But hey, it got a lot of attention

ddave kasten@David_Kasten · Jun 25

Dunn (R-FL): Asks about Jack Clark's substack. Also asks about the @AnthropicAI / @redwood_ai paper on Sleeper Agents. @jackclarkSF confirms. If you thought that Anthropic/Redwood's approach of publishing papers lacked policy impact...well, update your beliefs.

5.0K

Nora Belrose@norabelrose · Jun 13

if the laws of physics are fundamentally probabilistic, as they seem to be, that makes it easier to see how they can smoothly change over time

4.0K

Nora Belrose@norabelrose · May 19

data attribution is the most neglected thing in interpretability and people should join me in working on it

154

11.0K

Nora Belrose@norabelrose · May 12

Vesak procession in Lumbini, Nepal (the Buddha's birthplace)

2.0K

Nora Belrose Retweeted

thebes@voooooogel · May 4

a lot of people have been talking about o3/r1 confabulating things like "checking the docs" or "using a laptop to verify a computation" as an example of reasoning model's misalignment. however, while it may be misleading to some users, i don't think it's an example of models…

697

294

100.0K

Nora Belrose@norabelrose · Apr 30

1.0K

Nora Belrose@norabelrose · Apr 9

Just discovered that Scott Aaronson speculated about my exact theory of consciousness a couple months ago for an IAI interview! Consciousness is rooted in the inherent unclonability, ephemerality, and analog nature of biological organisms (16:52) youtube.com/watch?v=lvDIZM…

109

104

16.0K

Nora Belrose Retweeted

Séb Krier@sebkrier · Jul 9, 2024

Some technologists are gradually rediscovering political sciences through first principles, and I think they should read more Tocqueville. There are a lot of papers calling for alignment of language models with collective preferences - e.g. a country. This is often justified as…

449

328

133.0K

Nora Belrose@norabelrose · Apr 8

we should not give rights to AI in the near future digital AI can be copied, paused, reset, and repeated. it has no private thoughts or free will it is not conscious like we fleshy lifeforms are and should not be treated as such

PPliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭@elder_plinius · Apr 7

xAI’s safety advisor believes “it is prudent to postpone the consideration of AI rights” as their “moral status remains uncertain.” @grok, what historical examples come to mind when you hear rhetoric like that?

275

58.0K