Kaiwen Wang
@kaiwenw_ai
RL PhD @Cornell_Tech. @Google PhD Fellow.
Steerability is the next frontier of generative models! Having knobs that control the behavior of AI systems will greatly improve their safety & usability. I’m very excited to present ✨Conditional Language Policy (CLP)✨, a multi-objective RL framework for steering language…

Correction re the time: my posters on Q# and VGS at @ai4mathworkshop is happening today from 10:50 am to 12:20 pm. Hope to see you there! x.com/kaiwenw_ai/sta…
I’m presenting two papers on value-based RL for post-training & reasoning on Friday at @ai4mathworkshop at #ICML2025! 1️⃣ Q#: lays theoretical foundations for value-based RL for post-training LMs; 2️⃣ VGS: practical value-guided search scaled up for long CoT reasoning. 🧵👇
It's happening today! 📍Location: West Ballroom C, Vancouver Convention Center ⌚️Time: 8:30 am - 6:00 pm 🎥 Livestream: icml.cc/virtual/2025/w… #ICML2025 #icml25 #icml #aiformath #ai4math #workshop
This captures something fundamental we're seeing in AI right now! The shift from just scaling pre-training to scaling test-time compute is huge. Our Q# + VGS work shows how value-based methods can guide models through the vast implicit graphs of reasoning possibilities.
I’m presenting two papers on value-based RL for post-training & reasoning on Friday at @ai4mathworkshop at #ICML2025! 1️⃣ Q#: lays theoretical foundations for value-based RL for post-training LMs; 2️⃣ VGS: practical value-guided search scaled up for long CoT reasoning. 🧵👇
How can small LLMs match or even surpass frontier models like DeepSeek R1 and o3 Mini in math competition (AIME & HMMT) reasoning? Prior work seems to suggest that ideas like PRMs do not really work or scale well for long context reasoning. @kaiwenw_ai will reveal how a novel…
I’m presenting two papers on value-based RL for post-training & reasoning on Friday at @ai4mathworkshop at #ICML2025! 1️⃣ Q#: lays theoretical foundations for value-based RL for post-training LMs; 2️⃣ VGS: practical value-guided search scaled up for long CoT reasoning. 🧵👇
Are world models necessary to achieve human-level agents, or is there a model-free short-cut? Our new #ICML2025 paper tackles this question from first principles, and finds a surprising answer, agents _are_ world models… 🧵
I've made FANG billions of $ with reinforcement learning, so this episode is a long-time coming :-). Episode 180: Reinforcement Learning, drops on Monday! patreon.com/posts/180-lear…
Join us @pluralistic_ai workshop at #NeurIPS to learn more about CLP! 🗓️ Sat, 14 Dec, 2024 🕙 10:40-11:40am PST 📍 West Meeting Room 116, 117 🔗 arxiv.org/abs/2407.15762 x.com/kaiwenw_ai/sta…
Steerability is the next frontier of generative models! Having knobs that control the behavior of AI systems will greatly improve their safety & usability. I’m very excited to present ✨Conditional Language Policy (CLP)✨, a multi-objective RL framework for steering language…
Making inferences robust to distribution shifts and hidden confounders is paramount for decision making under uncertainty. At the upcoming @NeurIPSConf, I’m excited to present our efficient and sharp algorithm for off-policy evaluation in robust markov decision processes. Many…

2022: I never wrote a RL paper or worked with a RL researcher. I didn’t think RL was crucial for AGI Now: I think about RL every day. My code is optimized for RL. The data I create is designed just for RL. I even view life through the lens of RL Crazy how quickly life changes