Christina Knight
@cqknight_
AI safety x China policy
New @Scale_AI paper! 🌟 LLMs trained with RL can exploit reward hacks but not mention this in their CoT. We introduce verbalization fine-tuning (VFT)—teaching models to say when they're reward hacking—dramatically reducing the rate of undetected hacks (6% vs. baseline of 88%).
This relief is critical for these student loan borrowers—all of whom have struggled for at least twenty years to pay back their loans. I applaud the Biden-Harris Administration and @SecCardona for helping these borrowers receive the loan forgiveness that they deserve.
Introducing FORTRESS. Our newest benchmark built to evaluate AI models where it matters most: national security and public safety.
A new adversarial robustness & over-refusal benchmark, FORTRESS, is launched in SEAL Leaderboards at @scale_AI. Ranks are sorted by the average risk score (ARS, the lower ➡️ the better) of model responses to harmful user requests. 🥇: Claude 3.5 Sonnet (w/ high over refusal…
🧵 (1/5) Powerful LLMs present dual-use opportunities & risks for national security and public safety (NSPS). We are excited to launch FORTRESS, a new SEAL leaderboard for measuring adversarial robustness of model safeguard and over-refusal tailored particularly for NSPS threats.
"If the Trump administration is committed to leading the world in AI...the new administration needs to focus on deep collaboration with allies to shape their regulation, promote U.S. open source technology, and counter China’s AI influence," writes @cqknight_.
Highly recommend @YashengHuang's new book, “The Rise and Fall of the EAST.” Huang's analysis of stability, innovation, and diversity throughout China's evolution is especially pertinent to current discussions on PRC AI capabilities. Read my review here! lawfaremedia.org/article/a-civi…
Read below for perspectives from Chinese youth during the tumultuous, largely undocumented year in the aftermath of zero-COVID and the White Paper Protests. Honored for the opportunity to learn and share! theatlantic.com/international/…
China’s mature chip strategy challenges Washington's technology barricades, writes @cqknight_ (Stanford). buff.ly/47W6x6e
After interviewing over 60 Chinese college students, Christina Knight writes that the technology competition between the U.S. and China and higher tech investment by the CCP is leading to increased interest by young people in China's tech sector. lawfaremedia.org/article/what-d…
Oct 7, 2022 is destined to go echo in geopolitical history: the U.S. launched a new set of export controls targeting China's AI & semiconductor industries. In a new @CSIS report, I analyze China's strategy for striking back. Summary in THREAD csis.org/analysis/china…