AI Safety Papers

@safe_paper

Sharing the latest in AI safety research. "One who says they have no time to read papers will never read papers even when they have time a-plenty."

Joined May 2023

211Following

2KFollowers

AI Safety Papers@safe_paper · 21 h

LLMs Encode Harmfulness and Refusal Separately Jiachen Zhao (@jcz12856876), Jing Huang, Zhengxuan Wu (@ZhengxuanZenWu), @davidbau, Weiyan Shi (@shi_weiyan)

safe_paper's tweet image. LLMs Encode Harmfulness and Refusal Separately

Jiachen Zhao (@jcz12856876), Jing Huang, Zhengxuan Wu (@ZhengxuanZenWu), @davidbau, Weiyan Shi (@shi_weiyan)

2.0K