Ryan Greenblatt

@RyanPGreenblatt

Chief scientist at Redwood Research (@redwood_ai), focused on technical AI safety research to reduce risks from rogue AIs

Joined September 2023

4Following

5KFollowers

Pinned

Ryan Greenblatt@RyanPGreenblatt · Dec 18

New Redwood Research (@redwood_ai) paper in collaboration with @AnthropicAI: We demonstrate cases where Claude fakes alignment when it strongly dislikes what it is being trained to do. (Thread)

AAnthropic@AnthropicAI · Dec 18

New Anthropic research: Alignment faking in large language models. In a series of experiments with Redwood Research, we found that Claude often pretends to have different views during training, while actually maintaining its original preferences.

350

141

96.0K

Ryan Greenblatt Retweeted

Bowen Baker@bobabowen · Jul 15

Modern reasoning models think in plain English. Monitoring their thoughts could be a powerful, yet fragile, tool for overseeing future AI systems. I and researchers across many organizations think we should work to evaluate, preserve, and even improve CoT monitorability.

149

789

504

680.0K

Ryan Greenblatt@RyanPGreenblatt · Jul 8

In this new @80000Hours podcast, I talk about timelines to powerful AI, the speed of AI progress after full automation of AI R&D, what AI takeover might look like, and some other related topics.

RRob Wiblin@robertwiblin · Jul 8

Ryan Greenblatt is lead author of "Alignment faking in LLMs" and one of AI's most productive researchers. He puts a 25% probability on automating AI research by 2029. We discuss: • Concrete evidence for and against AGI coming soon • The 4 easiest ways for AI to take over •…

5.0K

Ryan Greenblatt@RyanPGreenblatt · Jul 1

FRI found that superforecasters and bio experts dramatically underestimated AI progress in virology: they often predicted it would take 5-10 years for AI to match experts on a benchmark for troubleshooting virology (VCT), but actually AIs had already reached this level.

FForecasting Research Institute@Research_FRI · Jul 1

Our new study finds: recent AI capabilities could increase the risk of a human-caused epidemic by 2-5x, according to 46 biosecurity experts and 22 top forecasters. One critical AI threshold that most experts said wouldn't be hit until 2030 was actually crossed in early 2025. But…

161

18.0K

Ryan Greenblatt@RyanPGreenblatt · Jun 18

Someone thought it would be useful to quickly write up a note on my thoughts on scalable oversight research, e.g., research into techniques like debate or generally improving the quality of human oversight using AI assistance or other methods. Broadly, my view is that this is a…

2.0K