Nora Belrose
@norabelrose
AI, philosophy, spirituality. Blending Deleuze and Dōgen. Head of interpretability research at @AiEleuther, but tweets are my own views, not Eleuther’s.
The "sleeper agent" terminology is hyperbolic and unfortunate IMO. Crying wolf. Should have reserved such an aggressive title for *actually finding dangerous sleeper agents*. But hey, it got a lot of attention
Dunn (R-FL): Asks about Jack Clark's substack. Also asks about the @AnthropicAI / @redwood_ai paper on Sleeper Agents. @jackclarkSF confirms. If you thought that Anthropic/Redwood's approach of publishing papers lacked policy impact...well, update your beliefs.
if the laws of physics are fundamentally probabilistic, as they seem to be, that makes it easier to see how they can smoothly change over time
data attribution is the most neglected thing in interpretability and people should join me in working on it
Vesak procession in Lumbini, Nepal (the Buddha's birthplace)
a lot of people have been talking about o3/r1 confabulating things like "checking the docs" or "using a laptop to verify a computation" as an example of reasoning model's misalignment. however, while it may be misleading to some users, i don't think it's an example of models…
Just discovered that Scott Aaronson speculated about my exact theory of consciousness a couple months ago for an IAI interview! Consciousness is rooted in the inherent unclonability, ephemerality, and analog nature of biological organisms (16:52) youtube.com/watch?v=lvDIZM…
Some technologists are gradually rediscovering political sciences through first principles, and I think they should read more Tocqueville. There are a lot of papers calling for alignment of language models with collective preferences - e.g. a country. This is often justified as…
we should not give rights to AI in the near future digital AI can be copied, paused, reset, and repeated. it has no private thoughts or free will it is not conscious like we fleshy lifeforms are and should not be treated as such
xAI’s safety advisor believes “it is prudent to postpone the consideration of AI rights” as their “moral status remains uncertain.” @grok, what historical examples come to mind when you hear rhetoric like that?