Yong Lin
@Yong18850571
Postdoc Fellow @PrincetonPLI @Princeton. Focusing on formal math reasoning. Apple AI/ML PhD Fellow 2023. Obtained PhD degree from @HKUST
(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B…


🚀 Excited to share that the Workshop on Mathematical Reasoning and AI (MATH‑AI) will be at NeurIPS 2025! 📅 Dec 6 or 7 (TBD), 2025 🌴 San Diego, California
Another AI system, ByteDance's SeedProver solved 4 out of 6 IMO problems *with* Lean, and solved a fifth with extended compute. This is becoming routine, like when we went to the moon for the fourth time. There is *nothing* "routine" about this!!...
While IMO is trending, our model leads on college-level math (Putnam Benchmark)—nearly doubling the problems solved by prior SOTA, with formal, verifiable proofs! Moreover, it’s not just an announcement—you can actually download and use our model. 🙂
🔥Our Goedel-Prover-V2-32B topped the PutnamBench Leaderboard by solving 86 problems —nearly 2× more than the previous SOTA DeepSeek-Prover-V2-671B (solved 47), while using: * 1/20 the model size (32B vs. 671B) * 1/5 the passes (184 vs. 1024) Meanwhile, we also release *…
🔥Our Goedel-Prover-V2-32B topped the PutnamBench Leaderboard by solving 86 problems —nearly 2× more than the previous SOTA DeepSeek-Prover-V2-671B (solved 47), while using: * 1/20 the model size (32B vs. 671B) * 1/5 the passes (184 vs. 1024) Meanwhile, we also release *…

Congrats! As a scientist/mathematician trained to verify things rigorously, I'm curious—will we get to see a bit more than tweets and final outputs (e.g., how they were generated/selected) to verify the claims? 🙂
1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
Goedel Prover V2 (blog.goedel-prover.com) will be featured at @ai4mathworkshop today. Come and discuss with us!
Reward models (RMs) are key to language model post-training and inference pipelines. But, little is known about the relative pros and cons of different RM types. 📰 We investigate why RMs implicitly defined by language models (LMs) often generalize worse than explicit RMs 🧵 1/6
🚨 Easy math, epic fail! 🚨 Our new benchmark, Ineq-Comp, gives formal theorem provers Lean inequalities... then makes tiny tweaks (duplicating variables, squaring terms) that humans handle easily. Most provers collapse. Simple composition is still surprisingly hard!
[1] Kids improve when a good teacher offers adaptive, targeted feedback. Can a small LLM benefit if a large LLM provide helpful feedback, in-context?? Naive ideas fail here. We propose AdaptMI: adaptive, skill-based in-context supervision that boosts 1B models by 6% on…
Our warmest congratulations to @danqi_chen, @stanfordnlp grad and now Associate Professor at @PrincetonCS and Associate Director of @PrincetonPLI on her stunning @iclr_conf keynote!
Our VP of Reinforcement Learning David Silver believes we must go “beyond what humans know” - moving towards systems that can learn for themselves, and even discover new scientific knowledge. 🧠 Listen in on his conversation with our podcast host @FryRSquared →…
I sent a message to my PhD students and postdocs at @Princeton a couple of weeks ago regarding freezes/cuts to federal research funding (this was before the freeze on federal funding to Princeton). I am sharing it here in case others find it helpful in having similar…