Cassidy Laidlaw

@cassidy_laidlaw

PhD student at UC Berkeley studying RL and AI safety. Also at https://bsky.app/profile/cassidylaidlaw.bsky.social

Joined February 2011

237Following

1KFollowers

Pinned

Cassidy Laidlaw@cassidy_laidlaw · Apr 11

We built an AI assistant that plays Minecraft with you. Start building a house—it figures out what you’re doing and jumps in to help. This assistant *wasn't* trained with RLHF. Instead, it's powered by *assistance games*, a better path forward for building AI assistants. 🧵

217

2.0K

1.0K

487.0K

Cassidy Laidlaw@cassidy_laidlaw · Jan 22

New work with @yaowenye123: DPO can fail completely when annotators don't provide accurate supervision. We propose an alternative post-training method called ILR that is still able to learn effectively from unreliable feedback!

YYaowen Ye@yaowenye123 · Jan 22

What happens when humans can’t reliably supervise LLMs during RLHF? In a new paper, we find that unreliable supervision can cause DPO to fail completely. Instead of DPO/RLHF, we propose using human feedback to update the *SFT dataset* and show this works much better! 🧵

2.0K

Cassidy Laidlaw Retweeted

Yaowen Ye@yaowenye123 · Jan 22

10.0K