Kaixuan Huang

@KaixuanHuang1

AGI strategist. PhD Student @Princeton; Google PhD Fellowship 2024, Ex-Intern @GoogleDeepMind; undergrad @PKU1898. opinions my own

Princeton, NJ

Joined July 2019

912Following

1KFollowers

Pinned

Kaixuan Huang@KaixuanHuang1 · Feb 11

Do LLMs have true generalizable mathematical reasoning capability or are they merely memorizing problem-solving skills? 🤨 We present MATH-Perturb, modified level-5 problems from MATH dataset to benchmark LLMs' generalizability to slightly perturbed problems. 🔗…

KaixuanHuang1's tweet image. Do LLMs have true generalizable mathematical reasoning capability or are they merely memorizing problem-solving skills? 🤨

We present MATH-Perturb, modified level-5 problems from MATH dataset to benchmark LLMs' generalizability to slightly perturbed problems.

🔗…

133

918

723

202.0K

Kaixuan Huang Retweeted

Mengdi Wang@MengdiWang10 · Jul 21

Just returned from ICML 2025 where I had the honor of keynoting three remarkable workshops. Grateful for the opportunity to delve into topics like self-evolving Alita agents, CRISPR-GPT for AI-driven science, Genome-Bench, reinforcement-learning agents, and AI biosafety. Special…

7.0K

Kaixuan Huang Retweeted

CL • Le Cong@lecong · Jul 21

💔 2nd&3rd deaths linked to Sarepta gene therapy—trial pause, stock drop. Must accelerate safer gene & cell cures. AI design & AI agents + real world validation can help contribute! 🚀 AI momentum: SynBioBeta’s “Towards an AI-Driven CRISPR Future” (synbiobeta.com/read/towards-a…) charts…

1.0K

Kaixuan Huang Retweeted

Aran Komatsuzaki@arankomatsuzaki · Jun 19

I'd like to see Meta building a lean LLM team around Narang, Allen-Zhu, Mike Lewis, Zettlemoyer and Sukhbaatar and giving them all the budget and power.

129

26.0K

Kaixuan Huang@KaixuanHuang1 · Jun 3

Given the sheer number of ppl interested in PG methods nowadays I'm sure innocent "rediscoveries" like this are happening everyday. Otoh, due diligence takes minimal effort today as you can just DeepResearch. All it takes is the sense/taste to ask "no way this is not done b4"...

HHaitham Bou Ammar@hbouammar · Jun 2

I read this paper in detail, and I am very sad! They literally re-do the optimal reward baseline work that we have known since forever, without even crediting the true authors in their derivations. The third screenshot is taken from: ieeexplore.ieee.org/stamp/stamp.js… As you see, they…

7.0K

Kaixuan Huang@KaixuanHuang1 · Jun 3

Glad to see that CRISPR-GPT inspired wonderful work on general-purpose biomedical agents 😀 Congrats on the release of Biomni!

KKexin Huang@KexinHuang5 · May 29

📢 Introducing Biomni - the first general-purpose biomedical AI agent. Biomni is built on the first unified environment for biomedical agent with 150 tools, 59 databases, and 106 software packages and a generalist agent design with retrieval, planning, and code as action. This…

687

Kaixuan Huang Retweeted

Peter Henderson@PeterHndrsn · May 27

The next ~1-4 years will be taking the 2017-2020 years of Deep RL and scaling up: exploration, generalization, long-horizon tasks, credit assignment, continual learning, multi-agent interaction! Lots of cool work to be done! 🎮🤖 But we shouldn't forget big lessons from back…

338

194

47.0K

Kaixuan Huang@KaixuanHuang1 · May 20

"Speeding up LLMs using discrete diffusion models", now by Gemini Diffusion. Whoever has access, please tell me whether the model only supports deterministic generations --- outputting the same response every time it's given the same input.

KKaixuan Huang@KaixuanHuang1 · May 2

I just found my old email written on Dec 4, 2023, where I talked about the three research directions I am most excited about. After 1.5 years: (1) was done by o1, QwQ, Deepseek-R1, etc. (2) is being explored in @InceptionAILabs. It seems (3) is still ongoing and hasn't been…

967

Kaixuan Huang@KaixuanHuang1 · May 12

2025 is the year of benchmarks and agents. 2026 will be the year of the unified world model.

OOpenAI@OpenAI · May 12

Evaluations are essential to understanding how models perform in health settings. HealthBench is a new evaluation benchmark, developed with input from 250+ physicians from around the world, now available in our GitHub repository. openai.com/index/healthbe…

10.0K

Kaixuan Huang Retweeted

Princeton Computer Science@PrincetonCS · May 5

Congrats to Kai Li on being named a member of the American Academy of Arts & Sciences! 🎉 Li joined @Princeton in 1986 and has made important contributions to several research areas in computer science. bit.ly/3RPLxas

109

13.0K

Kaixuan Huang@KaixuanHuang1 · May 2

KaixuanHuang1's tweet image. I just found my old email written on Dec 4, 2023, where I talked about the three research directions I am most excited about.
After 1.5 years: (1) was done by o1, QwQ, Deepseek-R1, etc. (2) is being explored in @InceptionAILabs. It seems (3) is still ongoing and hasn't been…

12.0K

Kaixuan Huang@KaixuanHuang1 · Apr 23

Thrilled to know that our paper, `Safety Alignment Should be Made More Than Just a Few Tokens Deep`, received the ICLR 2025 Outstanding Paper Award. We sincerely thank the ICLR committee for awarding one of this year's Outstanding Paper Awards to AI Safety / Adversarial ML.…

IICLR 2026@iclr_conf · Apr 23

Outstanding Papers Safety Alignment Should be Made More Than Just a Few Tokens Deep. Xiangyu Qi, et al. Learning Dynamics of LLM Finetuning. Yi Ren and Danica J. Sutherland. AlphaEdit: Null-Space Constrained Model Editing for Language Models. Junfeng Fang, et al.

351

115

44.0K

Kaixuan Huang@KaixuanHuang1 · Apr 20

When I tested the performance of o3-mini on MATH-Perturb, I found that it performed significantly worse than o1-mini. After inspecting the raw outputs, I discovered that o3-mini used a lot of Unicode characters, and my previous parser failed to process them. So I hand-crafted a…

KaixuanHuang1's tweet image. When I tested the performance of o3-mini on MATH-Perturb, I found that it performed significantly worse than o1-mini.

After inspecting the raw outputs, I discovered that o3-mini used a lot of Unicode characters, and my previous parser failed to process them. So I hand-crafted a…

293

106

51.0K

Kaixuan Huang@KaixuanHuang1 · Apr 17

Life update: Following my recent graduation, I've joined the Bytedance Seed Edge team to pursue this research direction further. Although this post was written last year, my conviction in this approach has only strengthened (many ideas here echo compelling recent writings from…

TTianle Cai ✈️ ICML@tianle_cai · Oct 20

x.com/i/article/1848…

215

21.0K