Chi Jin
@chijinML
Assistant Prof @Princeton. Previously: ML theory, RL & optimization. Now: AI for math, games & decision making.
🚀 Huge milestone from our Goedel-Prover team: we’ve just released a new state-of-the-art model (8B & 32B) for automated theorem proving—surpassing the previous best 671B DeepSeek model by a wide margin, all with academic compute!
(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B…
Last-minute ICML trip—I'll be there for the next 3 days! ✈️Ping me if you’d like to chat about Goedel-Prover-V2, Pokémon, or anything, and stop by our posters!

#Day10 of #100DaysOfRL Starting to watch the lectures from ECE524 Foundations of Reinforcement Learning by @chijinML to clear up my mathematical foundations.
🚀 Excited to share that the Workshop on Mathematical Reasoning and AI (MATH‑AI) will be at NeurIPS 2025! 📅 Dec 6 or 7 (TBD), 2025 🌴 San Diego, California
Our new work on simulating economic systems using large language models is now online!
🚀 New preprint! 🤔 Can one agent “nudge” a synthetic civilization of Census‑grounded agents toward higher social welfare—all by optimizing utilities in‑context? Meet the LLM Economist ↓
This is a really strong result. A big leap in formal math!
Another AI system, ByteDance's SeedProver solved 4 out of 6 IMO problems *with* Lean, and solved a fifth with extended compute. This is becoming routine, like when we went to the moon for the fourth time. There is *nothing* "routine" about this!!...
⏱️AI is making verification process easier, with models verifying proofs in minutes. 💻 Now, @prfsanjeevarora, @chijinML, @danqi_chen and @PrincetonPLI have released Goedel Prover V2, a model more efficient and more accurate than any previous model. 👉 blog.goedel-prover.com
While IMO is trending, our model leads on college-level math (Putnam Benchmark)—nearly doubling the problems solved by prior SOTA, with formal, verifiable proofs! Moreover, it’s not just an announcement—you can actually download and use our model. 🙂
🔥Our Goedel-Prover-V2-32B topped the PutnamBench Leaderboard by solving 86 problems —nearly 2× more than the previous SOTA DeepSeek-Prover-V2-671B (solved 47), while using: * 1/20 the model size (32B vs. 671B) * 1/5 the passes (184 vs. 1024) Meanwhile, we also release *…
Congrats! As a scientist/mathematician trained to verify things rigorously, I'm curious—will we get to see a bit more than tweets and final outputs (e.g., how they were generated/selected) to verify the claims? 🙂
1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
I will also give a talk about theorem proving and Goedel-prover V2 at 12:45 today at @ai4mathworkshop . Drop by our talk and poster if you are at ICML!
Goedel Prover V2 (blog.goedel-prover.com) will be featured at @ai4mathworkshop today. Come and discuss with us!
👾Are you interested in LLMs for two-player competitive games with partial information? Or perhaps just a Pokemon fan? Come check out our #ICML spotlight poster at 4:30PM in West Exhibition Hall B2-B3 #W-815
🚀 6 days until my ICML spotlight poster! Key insights we’ll unpack: • Base LLM + test-time planning • Game-theoretic scaffolding • Context-engineered opponent prediction • Comparative LLM-as-judge (relative > absolute) Catch me Thu Jul 17, 4:30-7 PM PT👇
Formal math taking off at @PrincetonPLI ! New Goedel-Prover v2 8B model matches 2.5 month old Deepseek V2 prover 671B, but is 80x smaller. Our 32B model much better on all benchmarks (miniF2F, IMO, Putnam). I'm excited/shocked by how much this field has advanced in 6-7 months…
(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B…
(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B…
Excited to release our NeurIPS 2025 PokéAgent Challenge! Pokémon becomes a testbed for long-horizon learning and stochastic game theory. Curious to see which algorithms hold up under pressure.
🚀 Launch day! The NeurIPS 2025 PokéAgent Challenge is live. Two tracks: ① Showdown Battling – imperfect-info, turn-based strategy ② Pokemon Emerald Speedrunning – long horizon RPG planning 5 M labeled replays • starter kit • baselines. Bring your LLM, RL, or hybrid…
🚀 Launch day! The NeurIPS 2025 PokéAgent Challenge is live. Two tracks: ① Showdown Battling – imperfect-info, turn-based strategy ② Pokemon Emerald Speedrunning – long horizon RPG planning 5 M labeled replays • starter kit • baselines. Bring your LLM, RL, or hybrid…
Congratulations to my brilliant and accomplished student Qinghua Liu @qinghual2020 on his graduation! 🎓 Excited for his next chapter at #OpenAI. Also honored to hood our amazing Princeton graduates: Chia-Hao Li, Kurtland Chua, Kexin Jin, and Yixiao Chen! youtu.be/Bf1K9TcehOg
Excited to share that the PokeAgent challenge was accepted as a @NeurIPSConf competition! This should serve as an excellent standardized benchmark for competitive games AND ‘speedrunning’ the RPG. I hope to see both the RL and LLM agent communities working together here to eval…