Bohan Lyu
@Lyubh22
Rising senior @Tsinghua_Uni. Current intern @PrincetonPLI. Previously @ucsd_cse. I'm interested in ML and NLP.
A few months ago, I teamed up with two friends and bought some APIs on Taobao to create what I believe is an important dataset. We then submitted our work to ARR. Yesterday, the results came out, and our paper received an award nomination (meta 4.5/5, which, based on previous…

🚀 Huge milestone from our Goedel-Prover team: we’ve just released a new state-of-the-art model (8B & 32B) for automated theorem proving—surpassing the previous best 671B DeepSeek model by a wide margin, all with academic compute!
(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B…
(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B…
Attending my first academic conference, #ICML25! I'll present Adapting while Learning, a project I worked on during my visit to UCSD last summer, under the guidance of amazing mentors. If you’re interested in my work or just want to chat, let’s meet at the conference!

How do we ground #LLMs for Scientific Problems to mitigate the issue of hallucination? Check out our #icml2025 paper on ``Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation'' Paper: arxiv.org/abs/2411.00412 Code:…
🚀 Training an image generation model and picking sides between autoregressive (AR) and diffusion? Why not both? Check out MADFormer with half of the model layers for AR and half for diffusion. AR gives a fast guess for the next patch prediction while diffusion helps refine the…
🎉Our Spurious Rewards is available on ArXiv! We added experiments on - More prompts/steps/models/analysis... - Spurious Prompts! Surprisingly, we obtained 19.4% gains when replacing prompts with LaTex placeholder text (\lipsum) 😶🌫️ Check out our 2nd blog: tinyurl.com/spurious-prompt
🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…
🎥 Video diffusion models achieve stunning visual fidelity, powered by pretraining on massive internet-scale video datasets. But they’re not interactive—they don’t respond to actions or support causal rollout. 🤔 Can we harness their generative power to build autoregressive,…
Writing math proof in Lean is surprisingly addictive. Watching Terence Tao formalize Lean proofs feels like watching a top-tier gamer playing on Twitch. :-) youtube.com/watch?v=c1ixXM…
🚨 Easy math, epic fail! 🚨 Our new benchmark, Ineq-Comp, gives formal theorem provers Lean inequalities... then makes tiny tweaks (duplicating variables, squaring terms) that humans handle easily. Most provers collapse. Simple composition is still surprisingly hard!
Breaking News: #NeurIPS2025 received over 25,000 abstracts. Top submitters include: ChatGPT Univ: 6,500 Claude Inst of Tech: 4,255 Gemini Polytechnic: 3,135 DeepSeek R1 labs: 2,300 Llama Herd: 1,700 These groups also graciously agreed to…
I'll bring @papercopilot to #ICML2025 and advocate for a more transparent and regulated peer review process. This position paper was accepted to the #ICML2025 Position Track. I’d love to hear your thoughts and discuss how we can better support the AI/ML community. @openreviewnet…
look forward to the 'RL with tool calling & multi-turn' thing
We will present latest updates of verl at #ICLR2025: - recent RL recipes (DAPO, etc) - RL with tool calling & multi-turn - full sglang integration (with @lmsysorg ) - large scale optimizations, and many more Come join us!
ICML 2025's rebuttal process be like🤣: 👨💻 Authors: spend a whole week writing a careful rebuttal ✅ Reviewer: clicks "acknowledge" without reading 🚫 Author: not allowed to reply anymore So what does acknowledge mean here? "You speak. I pretend to listen. Conversation over."🙃