Yiping Wang

@ypwang61

Ph.D. @uwcse. undergraduate @ZJU_China. I'm interested in mathematics, agi, and physics.

Seattle

Joined February 2022

1KFollowing

1KFollowers

Pinned

Yiping Wang@ypwang61 · Apr 30

We only need ONE example for RLVR on LLMs to achieve significant improvement on math tasks! 📍RLVR with one training example can boost: - Qwen2.5-Math-1.5B: 36.0% → 73.6% - Qwen2.5-Math-7B: 51.0% → 79.2% on MATH500. 📄 Paper: arxiv.org/abs/2504.20571…

ypwang61's tweet image. We only need ONE example for RLVR on LLMs to achieve significant improvement on math tasks!

📍RLVR with one training example can boost:
- Qwen2.5-Math-1.5B: 36.0% → 73.6%
- Qwen2.5-Math-7B: 51.0% → 79.2%
on MATH500.

📄 Paper: arxiv.org/abs/2504.20571…

414

332

72.0K

Pinned

Yiping Wang@ypwang61 · Jul 24

Code release! 🚀 Following up on our IMO 2025 results with the public LLM Gemini 2.5 Pro — here’s the full pipeline & general (non-problem-specific) prompts. 👉 [github.com/lyang36/IMO25] Have fun exploring! #AI #Math #LLMs #IMO2025

LLin Yang@lyang36 · Jul 22

🚨 Olympiad math + AI: We ran Google’s Gemini 2.5 Pro on the fresh IMO 2025 problems. With careful prompting and pipeline design, it solved 5 out of 6 — remarkable for tasks demanding deep insight and creativity. The model could win gold! 🥇 #AI #Math #LLMs #IMO2025

280

164

39.0K

Pinned

Yiping Wang Retweeted

Qwen@Alibaba_Qwen · Jul 22

>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…

266

1.0K

9.0K

4.0K

1.8M

Pinned

Yiping Wang Retweeted

Stella Li@StellaLisy · Jul 22

WHY do you prefer something over another? Reward models treat preference as a black-box😶‍🌫️but human brains🧠decompose decisions into hidden attributes We built the first system to mirror how people really make decisions in our #COLM2025 paper🎨PrefPalette✨ Why it matters👉🏻🧵

365

259

40.0K

Pinned

Yiping Wang Retweeted

Lin Yang@lyang36 · Jul 22

118

1.0K

468

320.0K

Pinned

Yiping Wang Retweeted

Asankhaya Sharma@asankhaya · Jul 21

New SOTA for 26-circle packing: @ypwang61 achieved 2.635977 sum of radii using OpenEvolve (evolutionary optimization framework). Progress: AlphaEvolve paper reported 2.635, OpenEvolve made improvements, now new record at 2.635977. #CirclePacking #SOTA #Optimization #OpenEvolve

746

Pinned

Yiping Wang Retweeted

Jason Wei@_jasonwei · Jul 16

New blog post about asymmetry of verification and "verifier's law": jasonwei.net/blog/asymmetry… Asymmetry of verification–the idea that some tasks are much easier to verify than to solve–is becoming an important idea as we have RL that finally works generally. Great examples of…

242

1.0K

329.0K

Pinned

Yiping Wang@ypwang61 · Jul 6

Remember the drama about RLVR techniques that got improved accuracy with Qwen models basically no matter what you did – incorrect rewards, one example, entropy minimization, whatever? Then there was a refutation (pic). Now there are objections to the refutation too…

RRobert Washbourne@rawsh0 · May 30

Confused about the recent LLM RLVR tweet which claims reported accuracy gains can totally reverse? I was too. Until I realized some of the comparisons are unstandardized. I compiled discrepancies in a thread below 🧵👇

3.0K

Pinned

Yiping Wang Retweeted

Jason Wei@_jasonwei · Jul 16

Becoming an RL diehard in the past year and thinking about RL for most of my waking hours inadvertently taught me an important lesson about how to live my own life. One of the big concepts in RL is that you always want to be “on-policy”: instead of mimicking other people’s…

126

324

3.0K

2.0K

303.0K

Pinned

Yiping Wang Retweeted

Jason Wei@_jasonwei · Apr 13, 2023

I gave an invited lecture at New York University for @hhexiy's class! I covered three ideas driving the LLM revolution: scaling, emergence, and reasoning. I tried to frame them in a way that reveals why large LMs are special in the history of AI. Slides: docs.google.com/presentation/d…

152

565

402

179.0K

Pinned

Yiping Wang Retweeted

Gokul Swamy@g_k_swamy · Jul 15

Recent work has seemed somewhat magical: how can RL with *random* rewards make LLMs reason? We pull back the curtain on these claims and find out this unexpected behavior hinges on the inclusion of certain *heuristics* in the RL algorithm. Our blog post: tinyurl.com/heuristics-con…

477

427

81.0K

Pinned

Yiping Wang Retweeted

Yong Lin@Yong18850571 · Jul 15

(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B…

247

117

56.0K

Pinned

Yiping Wang Retweeted

Jia Li@JiaLi52524397 · Jul 10

Happy to introduce Kimina-Prover-72B ! Reaching 92.2% on miniF2F using Test time RL. It can solve IMO problems using more than 500 lines of Lean 4 code ! Check our blog post here: huggingface.co/blog/AI-MO/kim… And play with our demo ! demo.projectnumina.ai

276

26.0K

Pinned

Yiping Wang Retweeted

Kevin Lu@_kevinlu · Jul 9

Why you should stop working on RL research and instead work on product // The technology that unlocked the big scaling shift in AI is the internet, not transformers I think it's well known that data is the most important thing in AI, and also that researchers choose not to work…

152

2.0K

811.0K

Pinned

Yiping Wang Retweeted

Shengyang Sun@ssydasheng · Jul 10

We built 200k-GPU clusters; We scaled up & curated higher-quality data; We scaled compute by 100x; We developed training & test-time recipes; We made everything RL native; We stabilized infrastructure and speeded up; That's how you turn RL into the pre-training scale. Yet I am…

159

1.0K

242

166.0K

Pinned

Yiping Wang@ypwang61 · Jul 9

Can data owners & LM developers collaborate to build a strong shared model while each retaining data control? Introducing FlexOlmo💪, a mixture-of-experts LM enabling: • Flexible training on your local data without sharing it • Flexible inference to opt in/out your data…

AAi2@allen_ai · Jul 9

Introducing FlexOlmo, a new paradigm for language model training that enables the co-development of AI through data collaboration. 🧵

269

52.0K

Pinned

Yiping Wang Retweeted

Liliang Ren@liliang_ren · Jul 9

Reasoning can be made much, much faster—with fundamental changes in neural architecture. 😮 Introducing Phi4-mini-Flash-Reasoning: a 3.8B model that surpasses Phi4-mini-Reasoning on major reasoning tasks (AIME24/25, MATH500, GPQA-D), while delivering up-to 10× higher throughput…

362

208

38.0K

Pinned

Yiping Wang Retweeted

Scott Geng@scottgeng00 · Jul 9

🤔 How do we train AI models that surpass their teachers? 🚨 In #COLM2025: ✨Delta learning ✨makes LLM post-training cheap and easy – with only weak data, we beat open 8B SOTA 🤯 The secret? Learn from the *differences* in weak data pairs! 📜 arxiv.org/abs/2507.06187 🧵 below

161

110

21.0K

Pinned

Yiping Wang Retweeted

Yanming Wan@yanming_wan · Jul 8

Personalization methods for LLMs often rely on extensive user history. We introduce Curiosity-driven User-modeling Reward as Intrinsic Objective (CURIO) to encourage actively learning about the user within multi-turn dialogs. 📜 arxiv.org/abs/2504.03206 🌎 sites.google.com/cs.washington.…

150

104

26.0K

Pinned

Yiping Wang@ypwang61 · Jul 8

Happy to share that ReasonIR is accepted by @COLM_conf! Synthetic data & test-time scaling are powerful tools to enable new capabilities for challenging tasks. I’m impressed by how quickly smaller retrievers and better rerankers have been developed with ReasonIR data! #COLM2025

RRulin Shao@RulinShao · May 1

Meet ReasonIR-8B✨the first retriever specifically trained for reasoning tasks! Our challenging synthetic training data unlocks SOTA scores on reasoning IR and RAG benchmarks. ReasonIR-8B ranks 1st on BRIGHT and outperforms search engine and retriever baselines on MMLU and GPQA🔥

134

10.0K

Pinned

Yiping Wang@ypwang61 · Jul 8

EvalTree accepted to @COLM_conf 2025 - my first PhD work and first COLM paper 🙌! What would you like to see next—extensions, applications, or other directions? Always open to ideas! 🧐

ZZhiyuan Zeng@ZhiyuanZeng_ · Mar 14

Is a single accuracy number all we can get from model evals?🤔 🚨Does NOT tell where the model fails 🚨Does NOT tell how to improve it Introducing EvalTree🌳 🔍identifying LM weaknesses in natural language 🚀weaknesses serve as actionable guidance (paper&demo 🔗in🧵) [1/n]

201

19.0K