A
AI Safety Papers
@safe_paper
Sharing the latest in AI safety research. "One who says they have no time to read papers will never read papers even when they have time a-plenty."
Joined May 2023
211Following
2KFollowers
A
AI Safety Papers@safe_paper · 21 h
LLMs Encode Harmfulness and Refusal Separately Jiachen Zhao (@jcz12856876), Jing Huang, Zhengxuan Wu (@ZhengxuanZenWu), @davidbau, Weiyan Shi (@shi_weiyan)

0
5
26
12
2.0K