Zhiyong Wang @ICML 2025

@Zhiyong16403503

Ph.D. candidate at CUHK. Former Visiting Scholar at Cornell. Working on reinforcement learning and multi-armed bandits.

Hong Kong

Joined September 2021

4KFollowing

768Followers

Pinned

Zhiyong Wang @ICML 2025 Retweeted

Nicolas Espinosa Dice@nico_espinosa_d · Jun 12

by incorporating self-consistency during offline RL training, we unlock three orthogonal directions of scaling: 1. efficient training (i.e. limit backprop through time) 2. expressive model classes (e.g. flow matching) 3. inference-time scaling (sequential and parallel) which,…

12.0K

Pinned

Zhiyong Wang @ICML 2025 Retweeted

Gokul Swamy@g_k_swamy · Apr 22

I won't be at #ICLR2025 myself this time around but please go talk to lead authors @nico_espinosa_d, @GaoZhaolin, and @runzhe_wu about their bleeding-edge algorithms for imitation learning and RLHF!

2.0K

Pinned

Zhiyong Wang @ICML 2025 Retweeted

Gokul Swamy@g_k_swamy · Apr 7

I think of misspecification (embodiment / sensory gaps) as the fundamental reason behavioral cloning isn't "all you need" for imitation as matching actions != matching outcomes. Introducing @nico_espinosa_d's #ICLR2025 paper proving that "local search" *is* all you need! [1/n]

102

15.0K

Pinned

Zhiyong Wang @ICML 2025 Retweeted

Association for Computing Machinery@TheOfficialACM · Mar 5

Meet the recipients of the 2024 ACM A.M. Turing Award, Andrew G. Barto and Richard S. Sutton! They are recognized for developing the conceptual and algorithmic foundations of reinforcement learning. Please join us in congratulating the two recipients! bit.ly/4hpdsbD

474

2.0K

136

444.0K

Zhiyong Wang @ICML 2025@Zhiyong16403503 · Jul 18

How can small LLMs match or even surpass frontier models like DeepSeek R1 and o3 Mini in math competition (AIME & HMMT) reasoning? Prior work seems to suggest that ideas like PRMs do not really work or scale well for long context reasoning. @kaiwenw_ai will reveal how a novel…

KKaiwen Wang@kaiwenw_ai · Jul 17

I’m presenting two papers on value-based RL for post-training & reasoning on Friday at @ai4mathworkshop at #ICML2025! 1️⃣ Q#: lays theoretical foundations for value-based RL for post-training LMs; 2️⃣ VGS: practical value-guided search scaled up for long CoT reasoning. 🧵👇

5.0K

Zhiyong Wang @ICML 2025@Zhiyong16403503 · Jul 16

Happy to share our work "Provable Zero-Shot Generalization in Offline Reinforcement Learning" at ICML 2025! 📍 Poster | 🗓️July 16, 11:00 AM – 1:30 PM 📌 West Exhibition Hall B2-B3 #W-1012 🤖 How can offline RL agents generalize zero-shot to unseen environments? We introduce…

698

Zhiyong Wang @ICML 2025@Zhiyong16403503 · Jul 16

Does RL actually learn positively under random rewards when optimizing Qwen on MATH? Is Qwen really that magical such that even RLing on random rewards can make it reason better? Following prior work on spurious rewards on RL, we ablated algorithms. It turns out that if you…

GGokul Swamy@g_k_swamy · Jul 15

Recent work has seemed somewhat magical: how can RL with *random* rewards make LLMs reason? We pull back the curtain on these claims and find out this unexpected behavior hinges on the inclusion of certain *heuristics* in the RL algorithm. Our blog post: tinyurl.com/heuristics-con…

104

12.0K

Zhiyong Wang @ICML 2025 Retweeted

Ruhan Wang@iu_ruhan · Jul 15

Curious how to combine federated learning and in-context learning for QA tasks — with privacy preservation, efficiency, and boosting performance round by round? 🚀 Meet Fed-ICL — our framework collaboratively refines answers without transmitting model weights or sharing raw…

490

Zhiyong Wang @ICML 2025 Retweeted

Owen Oertell@owenoertell · Jun 18

Tired of over-optimized generations that stray too far from the base distribution? We present SLCD: Supervised Learning based Controllable Diffusion, which (provably) solves the KL constrained reward maximization problem for diffusion through supervised learning! (1/n)

7.0K

Zhiyong Wang @ICML 2025 Retweeted

yobibyte@y0b1byte · May 7

Excellently written paper

254

2.0K

147.0K

Zhiyong Wang @ICML 2025 Retweeted

Yuda Song@yus167 · Apr 21

Heading to #ICLR2025 🇸🇬! Excited to connect with friends and chat about RL: theory, LLM reasoning and robotics! I will present our Oral paper on LLM self-improvement📍4:18pm Sat. Join me if you want to learn about its scaling laws, iterative training and test-time improvement.

6.0K

Zhiyong Wang @ICML 2025 Retweeted

Carlo Sferrazza@carlo_sferrazza · Apr 16

What is the place of exploration in today's AI landscape and in which settings can exploration algorithms address current open challenges? Join us to discuss this at our exciting workshop at @icmlconf 2025: EXAIT! exait-workshop.github.io #ICML2025

9.0K

Zhiyong Wang @ICML 2025 Retweeted

Laixi Shi@ShiLaixi · Feb 26

🚀 Rising Star Workshops for Junior/Senior PhDs, and Postdocs! 🌟 Don't miss these career-boosting opportunities! notion.so/List-of-Rising… Please share with your peers, students, and anyone who might benefit! #PhD #Postdoc #Academia #RisingStars

155

115

21.0K

Zhiyong Wang @ICML 2025 Retweeted

Daniel Russo@DanielRuss0 · Feb 25

There are multiple postdoc positions available as part of an exciting new AI-agent initiative at Columbia that tackles challenges at the frontier of agentic systems and sequential decision-making. I am not very active here so please help me spread the word!

8.0K

Zhiyong Wang @ICML 2025@Zhiyong16403503 · Feb 18

Extremely honored to receive this award. Credit goes to my collaborators, mentors, and especially my amazing students! #SloanFellow

SSloan Foundation@SloanFoundation · Feb 18

🎉Congrats to the 126 early-career scientists who have been awarded a Sloan Research Fellowship this year! These exceptional scholars are drawn from 51 institutions across the US and Canada, and represent the next generation of groundbreaking researchers. sloan.org/fellowships/20…

7.0K

Zhiyong Wang @ICML 2025 Retweeted

Emtiyaz Khan@EmtiyazKhan · Feb 7

List of accepted papers for AISTATS 2025 is now available. aistats.org/aistats2025/ Congratulations to the authors and thanks to the reviewers, AC, and SACs for their help. Thanks to my co-chair @ashipra & workflow chairs: Christopher Anders (RIKEN) & Tingting Ou (Columbia).

114

20.0K

Zhiyong Wang @ICML 2025 Retweeted

Gergely Neu@neu_rips · Jan 22

check this out: new postdoc program for AI-related research in Catalunya! our group is looking to hire within this program, ideally to work on topics related to RL theory. in case you're interested, pls DM or email me. (retweets appreciated!) ramonllull-aira.eu/application

4.0K

Zhiyong Wang @ICML 2025@Zhiyong16403503 · Jan 23

SMILING😊 is accepted to #ICLR2025! Do not miss it if you're seeking an imitation learning algorithm with rigorous theory and strong empirical results!

RRunzhe Wu@runzhe_wu · Oct 18

Tired of unstable GAN-style discriminator training in inverse RL? We introduce SMILING😊, a simple IRL approach that just requires diffusion score matching (i.e. regression!). We use SMILING to solve complex humanoid tasks for the first time without any action labels!

6.0K

Zhiyong Wang @ICML 2025@Zhiyong16403503 · Jan 15

Join us at #ICLR2025 in Singapore! Submit your work at the intersection of machine learning and climate (biodiversity counts!) by Jan 31. We especially encourage submissions that are focused on: 🔢 data-centric methods and challenges 🌏 focused on the Asia / Pacific region

CClimate Change AI@ClimateChangeAI · Dec 17

We're excited to announce the next edition of our workshop "Tackling Climate Change with Machine Learning" at #ICLR2025 in Singapore! ▶️ Mentorship program deadline: Dec 27, 2024 ▶️ Paper submission deadline: Jan 31, 2025 Learn more & submit: climatechange.ai/events/iclr2025

6.0K

Zhiyong Wang @ICML 2025 Retweeted

Fermat's Library@fermatslibrary · Jan 1

Happy New Year! 🎉 2025 will be the only square year (45²) in many of our lifetimes.

254

7.0K

65.0K

4.0K

3.6M