Ning Ding
@stingning
Researcher of AI/LM. Assistant Professor @Tsinghua_Uni. Working on scalable methods of language models.
Language models are trading Entropy for Rewards in reinforcement learning, meaning the uncertainty is transforming to certainty. The trading is even quantitively predictable: R = -a * exp(H) + b In our latest paper, we find that we should, and we can scientifically…
Learn RL with math, ponder it with philosophy.
Becoming an RL diehard in the past year and thinking about RL for most of my waking hours inadvertently taught me an important lesson about how to live my own life. One of the big concepts in RL is that you always want to be “on-policy”: instead of mimicking other people’s…
Becoming an RL diehard in the past year and thinking about RL for most of my waking hours inadvertently taught me an important lesson about how to live my own life. One of the big concepts in RL is that you always want to be “on-policy”: instead of mimicking other people’s…
I’m at #icml25 and will be presenting MedXpertQA at tmr’s first poster session. Come chat! 🗓️Tuesday, July 14, 11am 📍West Exhibition Hall B2-B3 W-306
📈How far are leading models from mastering realistic medical tasks? MedXpertQA, our new text & multimodal medical benchmark, reveals existing gaps in model abilities. Compared with rapidly saturating benchmarks like MedQA, we raise the bar with harder questions and a sharper…
This week I read about Boltz-2, Protriever, scGraph (Metric Mirages in Cell Embeddings), and also came across the very interesting paper "Limitations of Current Machine-Learning Models in Predicting Enzyme Functions" open.substack.com/pub/lindsaytts…
AI person getting bombarded by math + biology for a week (fully self-imposed)⬇️ open.substack.com/pub/lindsaytts…
AI person getting bombarded by math + biology for a week (fully self-imposed)⬇️ open.substack.com/pub/lindsaytts…
hello beautiful people of twitter, I've decided to start sharing my notes from reading AI&Bio papers on substack <3 (essentially because reading and writing are great but sadly I am incapable of doing anything without a deadline🥲) open.substack.com/pub/lindsaytts…
Thrilled to announce that I will be joining @UTAustin @UTCompSci as an assistant professor in fall 2026! I will continue working on language models, data challenges, learning paradigms, & AI for innovation. Looking forward to teaming up with new students & colleagues! 🤠🤘
A nice insight on scaling RL -- we gotta let the models keep entropy up.
We always want to scale up RL, yet simply training longer doesn't necessarily push the limits - exploration gets impeded by entropy collapse. We show that the performance ceiling is surprisingly predictable, and the collapse is driven by covariance between logp and advantage.
hello beautiful people of twitter, I've decided to start sharing my notes from reading AI&Bio papers on substack <3 (essentially because reading and writing are great but sadly I am incapable of doing anything without a deadline🥲) open.substack.com/pub/lindsaytts…
The limits of *naive* RL, yes.
We always want to scale up RL, yet simply training longer doesn't necessarily push the limits - exploration gets impeded by entropy collapse. We show that the performance ceiling is surprisingly predictable, and the collapse is driven by covariance between logp and advantage.
We always want to scale up RL, yet simply training longer doesn't necessarily push the limits - exploration gets impeded by entropy collapse. We show that the performance ceiling is surprisingly predictable, and the collapse is driven by covariance between logp and advantage.
🚩Thread to share our recent work, which delves into the entropy mechanism in reinforcement learning for reasoning models. 🔗 arxiv.org/abs/2505.22617
New from PRIME-RL: The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models investigates and offers solutions for the collapse of policy entropy!