Ning Ding (@stingning)

Pinned

N

Ning Ding@stingning · May 29

Language models are trading Entropy for Rewards in reinforcement learning, meaning the uncertainty is transforming to certainty. The trading is even quantitively predictable: R = -a * exp(H) + b In our latest paper, we find that we should, and we can scientifically…

9

72

496

471

50.0K

N

Ning Ding@stingning · Jul 16

Learn RL with math, ponder it with philosophy.

JJason Wei@_jasonwei · Jul 16

Becoming an RL diehard in the past year and thinking about RL for most of my waking hours inadvertently taught me an important lesson about how to live my own life. One of the big concepts in RL is that you always want to be “on-policy”: instead of mimicking other people’s…

0

1

0

448

Ning Ding Retweeted

J

Jason Wei@_jasonwei · Jul 16

Becoming an RL diehard in the past year and thinking about RL for most of my waking hours inadvertently taught me an important lesson about how to live my own life. One of the big concepts in RL is that you always want to be “on-policy”: instead of mimicking other people’s…

128

327

3.0K

2.0K

305.0K

N

Ning Ding@stingning · Jul 14

I’m at #icml25 and will be presenting MedXpertQA at tmr’s first poster session. Come chat! 🗓️Tuesday, July 14, 11am 📍West Exhibition Hall B2-B3 W-306

SShang Qu@lindsayttsq · Feb 3

📈How far are leading models from mastering realistic medical tasks? MedXpertQA, our new text & multimodal medical benchmark, reveals existing gaps in model abilities. Compared with rapidly saturating benchmarks like MedQA, we raise the bar with harder questions and a sharper…

1

3

9

0

652

N

Ning Ding@stingning · Jun 16

This week I read about Boltz-2, Protriever, scGraph (Metric Mirages in Cell Embeddings), and also came across the very interesting paper "Limitations of Current Machine-Learning Models in Predicting Enzyme Functions" open.substack.com/pub/lindsaytts…

SShang Qu@lindsayttsq · Jun 1

AI person getting bombarded by math + biology for a week (fully self-imposed)⬇️ open.substack.com/pub/lindsaytts…

1

2

11

8

3.0K

N

Ning Ding@stingning · Jun 1

AI person getting bombarded by math + biology for a week (fully self-imposed)⬇️ open.substack.com/pub/lindsaytts…

SShang Qu@lindsayttsq · May 26

hello beautiful people of twitter, I've decided to start sharing my notes from reading AI&Bio papers on substack <3 (essentially because reading and writing are great but sadly I am incapable of doing anything without a deadline🥲) open.substack.com/pub/lindsaytts…

1

3

9

3

3.0K

Ning Ding Retweeted

Y

Yizhong Wang@yizhongwyz · May 30

Thrilled to announce that I will be joining @UTAustin @UTCompSci as an assistant professor in fall 2026! I will continue working on language models, data challenges, learning paradigms, & AI for innovation. Looking forward to teaming up with new students & colleagues! 🤠🤘

101

54

669

72

73.0K

N

Ning Ding@stingning · May 30

A nice insight on scaling RL -- we gotta let the models keep entropy up.

LLifan Yuan@lifan__yuan · May 29

We always want to scale up RL, yet simply training longer doesn't necessarily push the limits - exploration gets impeded by entropy collapse. We show that the performance ceiling is surprisingly predictable, and the collapse is driven by covariance between logp and advantage.

2

22

225

159

20.0K

Ning Ding Retweeted

S

Shang Qu@lindsayttsq · May 26

hello beautiful people of twitter, I've decided to start sharing my notes from reading AI&Bio papers on substack <3 (essentially because reading and writing are great but sadly I am incapable of doing anything without a deadline🥲) open.substack.com/pub/lindsaytts…

0

3

10

3

2.0K

N

Ning Ding@stingning · May 29

The limits of *naive* RL, yes.

LLifan Yuan@lifan__yuan · May 29

We always want to scale up RL, yet simply training longer doesn't necessarily push the limits - exploration gets impeded by entropy collapse. We show that the performance ceiling is surprisingly predictable, and the collapse is driven by covariance between logp and advantage.

2

1

27

6

3.0K

Ning Ding Retweeted

L

Lifan Yuan@lifan__yuan · May 29

We always want to scale up RL, yet simply training longer doesn't necessarily push the limits - exploration gets impeded by entropy collapse. We show that the performance ceiling is surprisingly predictable, and the collapse is driven by covariance between logp and advantage.

8

86

544

506

62.0K

Ning Ding Retweeted

Y

Yuchen Zhang@yuchenzhan84564 · May 29

🚩Thread to share our recent work, which delves into the entropy mechanism in reinforcement learning for reasoning models. 🔗 arxiv.org/abs/2505.22617

1

9

4

896

Ning Ding Retweeted

A

AK@_akhaliq · May 29

discuss with author: huggingface.co/papers/2505.22…

0

1

13

4

5.0K

Ning Ding Retweeted

D

DailyPapers@HuggingPapers · May 29

New from PRIME-RL: The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models investigates and offers solutions for the collapse of policy entropy!

2

6

21

15

7.0K