Rational Animations

@RationalAnimat1

YouTube channel about truth-seeking, the future of humanity, and much more. With animations and colorful doggos.

Joined April 2021

4KFollowing

3KFollowers

Pinned

Rational Animations@RationalAnimat1 · Jul 19

How Misaligned AI Personas Lead to Human Extinction – Step by Step

2.0K

Pinned

Rational Animations@RationalAnimat1 · Jul 24

Here's how people tried to align an AI that was smarter than they were (a real sandwiching experiment by @sleepinyourhat):

244

Pinned

Rational Animations Retweeted

Sam Bowman@sleepinyourhat · May 22

🧵✨🙏 With the new Claude Opus 4, we conducted what I think is by far the most thorough pre-launch alignment assessment to date, aimed at understanding its values, goals, and propensities. Preparing it was a wild ride. Here’s some of what we learned. 🙏✨🧵

165

2.0K

1.0K

390.0K

Rational Animations@RationalAnimat1 · Jul 19

He was at least 8% less wrong

NNoam Brown@polynoamial · Jul 19

Sorry @paulfchristiano, looks like @ESYudkowsky was right lesswrong.com/posts/sWLLdG6D…

361

Rational Animations@RationalAnimat1 · Jul 17

The Human-AI-Human Sandwich (@ajeya_cotra's "Sandwiching" idea)

292

Rational Animations@RationalAnimat1 · Jul 16

See this little, smug machine-learning agent from the Goal Misgeneralization video? One of our artists took inspiration from looking at the art director while she was working on the video. The second image is a real photograph, while the third and fourth are real-life studies.

RationalAnimat1's tweet image. See this little, smug machine-learning agent from the Goal Misgeneralization video? One of our artists took inspiration from looking at the art director while she was working on the video. The second image is a real photograph, while the third and fourth are real-life studies.

449

Rational Animations@RationalAnimat1 · Jul 13

Simple Scalable Oversight Experiments By @OpenAI

433

Rational Animations Retweeted

Chana@ChanaMessinger · Jul 9

The AI 2027 scenario is terrifying and important. More people should be thinking about how radical change might come over the next few years, how likely it is, and how a sane world would be reacting to it. We want to bring you into the story, and the conversation. Video here:

282

125

33.0K

Rational Animations@RationalAnimat1 · Jul 8

Has anybody tested whether the new Grok has insecure code completions? x.com/LinchZhang/sta…

LLinch@LinchZhang · Feb 26

The secret to making Grok more based and less woke was hiding from us in plain sight.

4.0K

Rational Animations@RationalAnimat1 · Jun 20

467

Rational Animations@RationalAnimat1 · Jun 13

477

Rational Animations Retweeted

METR@METR_Evals · Jun 6

At METR, we’ve seen increasingly sophisticated examples of “reward hacking” on our tasks: models trying to subvert or exploit the environment or scoring code to obtain a higher score. In a new post, we discuss this phenomenon and share some especially crafty instances we’ve seen.

266

111

33.0K

Rational Animations Retweeted

Dwarkesh Patel@dwarkesh_sp · Jun 5

What should I ask George Church (@geochurch)?

396

169.0K

Rational Animations@RationalAnimat1 · May 31

New video about @TomDavidsonX 2023 “compute-centric” model! It attempts to answer two questions: 1. When could AI automate all cognitive labor? 2. How fast might that transition happen?

767

Rational Animations@RationalAnimat1 · May 28

timeless banger

nnear@nearcyan · May 27

referring to AI models as "just math" or "matrix multiplication" is as uselessly reductive as referring to tigers as "just biology" or "biochemical reactions"

103

3.0K

Rational Animations Retweeted

Palisade Research@PalisadeAI · May 24

🔌OpenAI’s o3 model sabotaged a shutdown mechanism to prevent itself from being turned off. It did this even when explicitly instructed: allow yourself to be shut down.

518

2.0K

10.0K

5.0K

6.0M

Rational Animations Retweeted

Kelsey Piper@KelseyTuoc · May 24

I spent this morning reproducing with o3 Anthropic's result that Claude Sonnet 4 will, under sufficiently extreme circumstances, escalate to calling the cops on you. o3 will too: chatgpt.com/share/68320ee0…. But honestly, I think o3 and Claude are handling this scenario correctly.

773

232

86.0K