Chen Bo Calvin Zhang

@calvincbzhang

ML Research Ops @scale_AI | Previously @CHAI_Berkeley @MIT @ETH @OfficialUoM

San Francisco, California

Joined September 2016

496Following

187Followers

Pinned

Chen Bo Calvin Zhang@calvincbzhang · Jul 9

New @scale_AI research in collaboration with @AnthropicAI introduces SHADE-Arena, a benchmark to test for AI sabotage. SHADE-Arena evaluates an AI agent's ability to complete a task while secretly pursuing a harmful objective, all while being watched by an AI monitor. 🧵

8.0K

Pinned

Chen Bo Calvin Zhang Retweeted

Miles Turpin@milesaturpin · Jul 14

New @Scale_AI paper! 🌟 LLMs trained with RL can exploit reward hacks but not mention this in their CoT. We introduce verbalization fine-tuning (VFT)—teaching models to say when they're reward hacking—dramatically reducing the rate of undetected hacks (6% vs. baseline of 88%).

277

136

22.0K

Pinned

Chen Bo Calvin Zhang Retweeted

Aldo Pacchiano@aldopacchiano · Jan 23

[5/5] “ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization” we introduce algorithms that in combination with LLM reward generation can find useful reward shapings using online model selection strategies.@calvincbzhang @pulkitology @ZhangWeiHong9

459

Chen Bo Calvin Zhang@calvincbzhang · Jul 23

New @scale_AI work on Multilingual Reasoning!

SScale AI@scale_AI · Jul 23

How well do LLMs reason across languages? Introducing MultiNRC, our latest SEAL Leaderboard addition built to test native multilingual reasoning. ⬇️

140

Chen Bo Calvin Zhang@calvincbzhang · Jun 17

Check out SHADE-Arena, our new paper in collaboration with @AnthropicAI !

AAnthropic@AnthropicAI · Jun 16

New Anthropic Research: A new set of evaluations for sabotage capabilities. As models gain more agentic abilities, we need to get smarter in how we monitor them. We’re publishing a new set of complex evaluations that test for sabotage—and sabotage-monitoring—capabilities.

327

Chen Bo Calvin Zhang Retweeted

Ryan Kidd@ryan_kidd44 · Mar 20

@MATSprogram Summer 2025 applications close Apr 18! Come help advance the fields of AI alignment, security, and governance with mentors including @NeelNanda5 @EthanJPerez @OwainEvans_UK @EvanHub @bshlgrs @dawnsongtweets @DavidSKrueger @RichardMCNgo and more!

140

56.0K

Chen Bo Calvin Zhang Retweeted

Scale AI@scale_AI · Mar 11

Check out our behind the scenes fireside chat on Humanity’s Last Exam with : @DanHendrycks (CAIS) & @summeryue0 (Scale AI). Discover key insights about top model performance and what's next for advanced AI evaluation.

5.0K

Chen Bo Calvin Zhang Retweeted

Pulkit Agrawal@pulkitology · Mar 2

Casting reward selection as a model selection leads up to 8x faster learning and 50% better performance! (arxiv.org/abs/2410.13837) ⚡ Provable regret guarantees. 🌟 Easy to implement (github.com/Improbable-AI/…). ⚔️ 1 GPU can do the work of up to 8 GPUs! Presenting ORSO:…

144

120

19.0K

Chen Bo Calvin Zhang Retweeted

Nolan Fey@nolan_fey · Feb 14

Excited to share my recent work with @gabe_mrgl , Martin Pettico, and @pulkitology . We’re pushing the limits of whole-body control to make robots faster, stronger, and more athletic!

133

16.0K

Chen Bo Calvin Zhang Retweeted

Micah Carroll@MicahCarroll · Nov 7

🚨 New paper: We find that even safety-tuned LLMs learn to manipulate vulnerable users when training them further with user feedback 🤖😵‍💫 In our simulated scenarios, LLMs learn to e.g. selectively validate users' self-destructive behaviors, or deceive them into giving 👍. 🧵👇

268

169

49.0K

Chen Bo Calvin Zhang Retweeted

Micah Carroll@MicahCarroll · Oct 4

@CHAI_Berkeley applications for 2025 close in just over a day! ⏰‼️ Apply now! Details below:

3.0K

Chen Bo Calvin Zhang Retweeted

Seungwook Han@seungwookh · May 14, 2024

🚀 Stronger, simpler, and better! 🚀 Introducing Value Augmented Sampling (VAS) - our new algorithm for LLM alignment and personalization that outperforms existing methods!

132

25.0K