Kaiwen Wang (@kaiwenw_ai)

Pinned

K

Kaiwen Wang@kaiwenw_ai · Nov 9

Steerability is the next frontier of generative models! Having knobs that control the behavior of AI systems will greatly improve their safety & usability. I’m very excited to present ✨Conditional Language Policy (CLP)✨, a multi-objective RL framework for steering language…

kaiwenw_ai's tweet image. Steerability is the next frontier of generative models!

Having knobs that control the behavior of AI systems will greatly improve their safety &amp; usability.

I’m very excited to present ✨Conditional Language Policy (CLP)✨, a multi-objective RL framework for steering language…

1

33

171

127

20.0K

K

Kaiwen Wang@kaiwenw_ai · Jul 18

Correction re the time: my posters on Q# and VGS at @ai4mathworkshop is happening today from 10:50 am to 12:20 pm. Hope to see you there! x.com/kaiwenw_ai/sta…

KKaiwen Wang@kaiwenw_ai · Jul 17

I’m presenting two papers on value-based RL for post-training & reasoning on Friday at @ai4mathworkshop at #ICML2025! 1️⃣ Q#: lays theoretical foundations for value-based RL for post-training LMs; 2️⃣ VGS: practical value-guided search scaled up for long CoT reasoning. 🧵👇

0

1

3

1

467

Kaiwen Wang Retweeted

A

AI for Math Workshop @ ICML 2025@ai4mathworkshop · Jul 18

It's happening today! 📍Location: West Ballroom C, Vancouver Convention Center ⌚️Time: 8:30 am - 6:00 pm 🎥 Livestream: icml.cc/virtual/2025/w… #ICML2025 #icml25 #icml #aiformath #ai4math #workshop

0

11

20

5

3.0K

K

Kaiwen Wang@kaiwenw_ai · Jul 18

This captures something fundamental we're seeing in AI right now! The shift from just scaling pre-training to scaling test-time compute is huge. Our Q# + VGS work shows how value-based methods can guide models through the vast implicit graphs of reasoning possibilities.

KKaiwen Wang@kaiwenw_ai · Jul 17

I’m presenting two papers on value-based RL for post-training & reasoning on Friday at @ai4mathworkshop at #ICML2025! 1️⃣ Q#: lays theoretical foundations for value-based RL for post-training LMs; 2️⃣ VGS: practical value-guided search scaled up for long CoT reasoning. 🧵👇

0

2

6

0

601

K

Kaiwen Wang@kaiwenw_ai · Jul 18

How can small LLMs match or even surpass frontier models like DeepSeek R1 and o3 Mini in math competition (AIME & HMMT) reasoning? Prior work seems to suggest that ideas like PRMs do not really work or scale well for long context reasoning. @kaiwenw_ai will reveal how a novel…

KKaiwen Wang@kaiwenw_ai · Jul 17

I’m presenting two papers on value-based RL for post-training & reasoning on Friday at @ai4mathworkshop at #ICML2025! 1️⃣ Q#: lays theoretical foundations for value-based RL for post-training LMs; 2️⃣ VGS: practical value-guided search scaled up for long CoT reasoning. 🧵👇

0

8

23

15

5.0K

Kaiwen Wang Retweeted

J

Jon Richens@jonathanrichens · Jun 4

Are world models necessary to achieve human-level agents, or is there a model-free short-cut? Our new #ICML2025 paper tackles this question from first principles, and finds a surprising answer, agents _are_ world models… 🧵

33

176

1.0K

181.0K

Kaiwen Wang Retweeted

J

Jason Gauci@NeuralNets4Life · Mar 12

I've made FANG billions of $ with reinforcement learning, so this episode is a long-time coming :-). Episode 180: Reinforcement Learning, drops on Monday! patreon.com/posts/180-lear…

0

2

3

0

224

K

Kaiwen Wang@kaiwenw_ai · Dec 13

Join us @pluralistic_ai workshop at #NeurIPS to learn more about CLP! 🗓️ Sat, 14 Dec, 2024 🕙 10:40-11:40am PST 📍 West Meeting Room 116, 117 🔗 arxiv.org/abs/2407.15762 x.com/kaiwenw_ai/sta…

KKaiwen Wang@kaiwenw_ai · Nov 9

Steerability is the next frontier of generative models! Having knobs that control the behavior of AI systems will greatly improve their safety & usability. I’m very excited to present ✨Conditional Language Policy (CLP)✨, a multi-objective RL framework for steering language…

0

2

8

0

698

K

Kaiwen Wang@kaiwenw_ai · Dec 8

Making inferences robust to distribution shifts and hidden confounders is paramount for decision making under uncertainty. At the upcoming @NeurIPSConf, I’m excited to present our efficient and sharp algorithm for off-policy evaluation in robust markov decision processes. Many…

kaiwenw_ai's tweet image. Making inferences robust to distribution shifts and hidden confounders is paramount for decision making under uncertainty.

At the upcoming @NeurIPSConf, I’m excited to present our efficient and sharp algorithm for off-policy evaluation in robust markov decision processes.

Many…

0

7

27

5

2.0K

Kaiwen Wang Retweeted

J

Jason Wei@_jasonwei · Dec 8

2022: I never wrote a RL paper or worked with a RL researcher. I didn’t think RL was crucial for AGI Now: I think about RL every day. My code is optimized for RL. The data I create is designed just for RL. I even view life through the lens of RL Crazy how quickly life changes

38

94

1.0K

426

169.0K