Yuanhe Zhang

@yuanhezhang6

Pragmatic Learning Theory, using tools from probability and Statistics | PhD in Stats @warwickstats 🇬🇧 | MMathStat @warwickstats 🇬🇧

Joined September 2022

102Following

21Followers

Pinned

Yuanhe Zhang@yuanhezhang6 · Jul 12

(1/n) 🚀Thrill to share our LoRA-One work (arxiv.org/abs/2502.01235) as #ICML25 𝐨𝐫𝐚𝐥 𝐩𝐫𝐞𝐬𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧, w. Fanghui @Fanghui_SgrA (Warwick) and Yudong (Madison). Oral @ West Ballroom B, 4pm at July 17th Poster @ West Exhibition Hall B2-B3 #W 905, 4:30PM at July 15th

2.0K

Pinned

Yuanhe Zhang@yuanhezhang6 · Jun 30

Why does RL struggle with long reasoning chains? Because finding the correct solution by chance is exponentially rare. Solution: break down the complexity of the problem somehow, and ease into it adaptively! We propose AdaBack: an adaptive backtracking method that conditions…

mmasani@MohammadHAmani · Jun 30

Why does RL struggle with tasks requiring long reasoning chains? Because “bumping into” a correct solution becomes exponentially less likely as the number of reasoning steps grows. We propose an adaptive backtracking algorithm: AdaBack. 1/n

155

159

15.0K

Yuanhe Zhang Retweeted

马

马东锡 NLP@dongxi_nlp · Jul 19

这周末读了context engineering的论文。个人感觉，主要方法非常接近RAG的4R: Retriver, Rewriter, Rranker, Reader. CE中关于memory，tool call response的方法，或多或少被4R 覆盖。而4R中，我最喜欢围绕 Rewriter 的工作。即，系统处理的query，并不一定是用户最初的query。…

274

231

34.0K

Yuanhe Zhang Retweeted

Pramod Goyal@goyal__pramod · Jul 20

A beautiful visual blog, where you can change values, interact, and see what each head does exactly inside the transformer.

422

3.0K

4.0K

240.0K

Yuanhe Zhang Retweeted

Shashwat Goel@ShashwatGoel7 · May 29

Confused about recent LLM RL results where models improve without any ground-truth signal? We were too. Until we looked at the reported numbers of the Pre-RL models and realized they were serverely underreported across papers. We compiled discrepancies in a blog below🧵👇

126

880

536

316.0K

Yuanhe Zhang Retweeted

马

马东锡 NLP@dongxi_nlp · Jul 15

「 Data Contamination，Qwen2.5 」 Qwen2.5 系列的 Data Contamination 问题被证实，模型在预训练阶段已经见过评测题目。前几个月，数篇 LLM Reasoning + RL 的论文发现，用极弱或随机奖励即可显著提升 Qwen 系列数学推理能力。这引发出 Qwen 模型在 pretraining 阶段已经见过评测题目的疑问。…

154

16.0K

Yuanhe Zhang Retweeted

Stat.ML Papers@StatMLPapers · Jul 3

Long-Context Linear System Identification ift.tt/zRvn8Xo

681

Yuanhe Zhang Retweeted

Stat.ML Papers@StatMLPapers · Jul 3

Self-reflective Uncertainties: Do LLMs Know Their Internal Answer Distribution? ift.tt/UnVtw01

2.0K

Yuanhe Zhang Retweeted

Stat.ML Papers@StatMLPapers · Jul 2

Gradient Descent Algorithm in Hilbert Spaces under Stationary Markov Chains with $\phi$- and $\beta$-Mixing ift.tt/0mzSlCO

2.0K

Yuanhe Zhang@yuanhezhang6 · Mar 22

I'm not sure if someone has already pointed this out, but Dr. GRPO still has a bias that is more pronounced the smaller the group size is. To make it unbiased, simply multiply Dr. GRPO's A_i by the correction term N/N-1. With this, you'll get LOOP (Leave-One-Out Proximal Policy…

ZZichen Liu@zzlccc · Mar 21

🪂Understanding R1-Zero-Like Training: A Critical Perspective * DeepSeek-V3-Base already exhibits "Aha moment" before RL-tuning?? * The ever-increasing output length in RL-tuning might be due to a BIAS in GRPO?? * Getting GRPO Done Right, we achieve a 7B AIME sota! 🧵 📜Full…

379

287

73.0K

Yuanhe Zhang Retweeted

Gokul Swamy@g_k_swamy · Jun 19

It was a dream come true to teach the course I wish existed at the start of my PhD. We built up the algorithmic foundations of modern-day RL, imitation learning, and RLHF, going deeper than the usual "grab bag of tricks". All 25 lectures + 150 pages of notes are now public! 🧵

708

932

52.0K

Yuanhe Zhang Retweeted

Konstantin Mishchenko@konstmish · Jun 19

There are several hypotheses for why Adam outperforms SGD on LLMs: heavy-tailed noise, blowing up curvature, near-constant magnitude of update, etc. The one I find most compelling is label imbalance: Adam specifically improves performance on rare classes, of which there are many.

302

246

20.0K

Yuanhe Zhang Retweeted

Sima Noorani@NooraniSimaa · Jun 12

How can we quantify uncertainty in LLMs from only a few sampled outputs? The key lies in the classical problem of missing mass—the probability of unseen outputs. This perspective offers a principled foundation for conformal prediction in query-only settings like LLMs.

7.0K