Chi Jin (@chijinML)

Pinned

C

Chi Jin@chijinML · Jul 15

🚀 Huge milestone from our Goedel-Prover team: we’ve just released a new state-of-the-art model (8B & 32B) for automated theorem proving—surpassing the previous best 671B DeepSeek model by a wide margin, all with academic compute!

YYong Lin@Yong18850571 · Jul 15

(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B…

3

10

55

6

6.0K

Pinned

C

Chi Jin@chijinML · Jul 16

Last-minute ICML trip—I'll be there for the next 3 days! ✈️Ping me if you’d like to chat about Goedel-Prover-V2, Pokémon, or anything, and stop by our posters!

chijinML's tweet image. Last-minute ICML trip—I'll be there for the next 3 days! ✈️Ping me if you’d like to chat about Goedel-Prover-V2, Pokémon, or anything, and stop by our posters!

1

26

3

2.0K

Chi Jin Retweeted

R

Rajeev Ranjan Pandey@rrpandey_in · Jul 26

#Day10 of #100DaysOfRL Starting to watch the lectures from ECE524 Foundations of Reinforcement Learning by @chijinML to clear up my mathematical foundations.

0

1

13

882

Chi Jin Retweeted

K

Kaiyu Yang@KaiyuYang4 · Jul 23

🚀 Excited to share that the Workshop on Mathematical Reasoning and AI (MATH‑AI) will be at NeurIPS 2025! 📅 Dec 6 or 7 (TBD), 2025 🌴 San Diego, California

7

42

216

45

24.0K

C

Chi Jin@chijinML · Jul 23

Our new work on simulating economic systems using large language models is now online!

SSeth Karten@sethkarten · Jul 23

🚀 New preprint! 🤔 Can one agent “nudge” a synthetic civilization of Census‑grounded agents toward higher social welfare—all by optimizing utilities in‑context? Meet the LLM Economist ↓

0

1

16

3

2.0K

C

Chi Jin@chijinML · Jul 23

This is a really strong result. A big leap in formal math!

AAlex Kontorovich@AlexKontorovich · Jul 23

Another AI system, ByteDance's SeedProver solved 4 out of 6 IMO problems *with* Lean, and solved a fifth with extended compute. This is becoming routine, like when we went to the moon for the fourth time. There is *nothing* "routine" about this!!...

0

1

20

2

3.0K

Chi Jin Retweeted

P

Princeton Computer Science@PrincetonCS · Jul 23

⏱️AI is making verification process easier, with models verifying proofs in minutes. 💻 Now, @prfsanjeevarora, @chijinML, @danqi_chen and @PrincetonPLI have released Goedel Prover V2, a model more efficient and more accurate than any previous model. 👉 blog.goedel-prover.com

0

23

86

25

18.0K

C

Chi Jin@chijinML · Jul 22

While IMO is trending, our model leads on college-level math (Putnam Benchmark)—nearly doubling the problems solved by prior SOTA, with formal, verifiable proofs! Moreover, it’s not just an announcement—you can actually download and use our model. 🙂

YYong Lin@Yong18850571 · Jul 22

🔥Our Goedel-Prover-V2-32B topped the PutnamBench Leaderboard by solving 86 problems —nearly 2× more than the previous SOTA DeepSeek-Prover-V2-671B (solved 47), while using: * 1/20 the model size (32B vs. 671B) * 1/5 the passes (184 vs. 1024) Meanwhile, we also release *…

4

21

167

48

14.0K

C

Chi Jin@chijinML · Jul 19

Congrats! As a scientist/mathematician trained to verify things rigorously, I'm curious—will we get to see a bit more than tweets and final outputs (e.g., how they were generated/selected) to verify the claims? 🙂

AAlexander Wei@alexwei_ · Jul 19

1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

4

2

107

13

11.0K

C

Chi Jin@chijinML · Jul 18

I will also give a talk about theorem proving and Goedel-prover V2 at 12:45 today at @ai4mathworkshop . Drop by our talk and poster if you are at ICML!

BBohan Lyu@Lyubh22 · Jul 18

Goedel Prover V2 (blog.goedel-prover.com) will be featured at @ai4mathworkshop today. Come and discuss with us!

0

6

30

3

2.0K

C

Chi Jin@chijinML · Jul 17

👾Are you interested in LLMs for two-player competitive games with partial information? Or perhaps just a Pokemon fan? Come check out our #ICML spotlight poster at 4:30PM in West Exhibition Hall B2-B3 #W-815

SSeth Karten@sethkarten · Jul 11

🚀 6 days until my ICML spotlight poster! Key insights we’ll unpack: • Base LLM + test-time planning • Game-theoretic scaffolding • Context-engineered opponent prediction • Comparative LLM-as-judge (relative > absolute) Catch me Thu Jul 17, 4:30-7 PM PT👇

0

5

31

12

3.0K

C

Chi Jin@chijinML · Jul 15

Formal math taking off at @PrincetonPLI ! New Goedel-Prover v2 8B model matches 2.5 month old Deepseek V2 prover 671B, but is 80x smaller. Our 32B model much better on all benchmarks (miniF2F, IMO, Putnam). I'm excited/shocked by how much this field has advanced in 6-7 months…

YYong Lin@Yong18850571 · Jul 15

(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B…

0

16

91

29

8.0K

Chi Jin Retweeted

Y

Yong Lin@Yong18850571 · Jul 15

(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B…

7

82

249

119

57.0K

C

Chi Jin@chijinML · Jul 14

Excited to release our NeurIPS 2025 PokéAgent Challenge! Pokémon becomes a testbed for long-horizon learning and stochastic game theory. Curious to see which algorithms hold up under pressure.

SSeth Karten@sethkarten · Jul 14

🚀 Launch day! The NeurIPS 2025 PokéAgent Challenge is live. Two tracks: ① Showdown Battling – imperfect-info, turn-based strategy ② Pokemon Emerald Speedrunning – long horizon RPG planning 5 M labeled replays • starter kit • baselines. Bring your LLM, RL, or hybrid…

0

6

50

10

3.0K

Chi Jin Retweeted

S

Seth Karten@sethkarten · Jul 14

🚀 Launch day! The NeurIPS 2025 PokéAgent Challenge is live. Two tracks: ① Showdown Battling – imperfect-info, turn-based strategy ② Pokemon Emerald Speedrunning – long horizon RPG planning 5 M labeled replays • starter kit • baselines. Bring your LLM, RL, or hybrid…

7

32

145

90

42.0K

C

Chi Jin@chijinML · May 27

Congratulations to my brilliant and accomplished student Qinghua Liu @qinghual2020 on his graduation! 🎓 Excited for his next chapter at #OpenAI. Also honored to hood our amazing Princeton graduates: Chia-Hao Li, Kurtland Chua, Kexin Jin, and Yixiao Chen! youtu.be/Bf1K9TcehOg

2

3

62

3

11.0K

Chi Jin Retweeted

S

Seth Karten@sethkarten · May 26

Excited to share that the PokeAgent challenge was accepted as a @NeurIPSConf competition! This should serve as an excellent standardized benchmark for competitive games AND ‘speedrunning’ the RPG. I hope to see both the RL and LLM agent communities working together here to eval…

6

8

71

16

8.0K