Haoran He
@tinner_he
Ph.D. student at @hkust, B.Eng. from @SJTU1896 | working on reinforcement learning, generative models (e.g., flow and diffusion models), and embodied AI.
🤩Mind-blowing discovery: Random policies can be surprisingly powerful for decision-making! Our ICML 2025 paper reveals how simple randomness leads to sophisticated reward-matching policies. Let me break this down...

Thrilled to share our #ICML2025 paper “The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep RL”, led by Jiashun Liu and with other great collaborators! We teach RL agents when to quit wasting effort, boosting efficiency with our proposed method LEAST. Here's the story 🧵👇🏾
A new asynchronous Reinforcement Learning Framework for training LLM at scale.
Introducing LlamaRL, a distributed RL framework for training LLM at scale. LlamaRL is highly modular, Pytorch-native, customizes optimization of actors/learners to max out throughput, and adjusts for systemic off-policyness to stabilize training arxiv.org/pdf/2505.24034
Thanks to @_akhaliq for sharing our work! By reinterpreting the denoising trajectory as an evolutionary path, we demonstrate that image and video generation performance can be efficiently and effectively enhanced through increased test-time computation budget.
Scaling Image and Video Generation via Test-Time Evolutionary Search
Happy 45² 2025 = 45² is a "square" year. The last "square" year was 1936 and the next one will be 2116. Very few of us will live in two "square" years. Even better; 2025 = 45² is a "perfect square" year. It is the square of the sum of ALL the digits of the decimal numbering…
Very impressive story! @goodfellow_ian only spent a single night coding and obtaining initial experimental results. Amazing!
Very happy to hear that GANs are getting the test of time award at NeurIPS 2024. The NeurIPS test of time awards are given to papers which have stood the test of the time for a decade. I took some time to reminisce how GANs came about and how AI has evolve in the last decade.