Kaiyu Yang (@KaiyuYang4)

Pinned

K

Kaiyu Yang@KaiyuYang4 · Jul 23

🚀 Excited to share that the Workshop on Mathematical Reasoning and AI (MATH‑AI) will be at NeurIPS 2025! 📅 Dec 6 or 7 (TBD), 2025 🌴 San Diego, California

KaiyuYang4's tweet image. 🚀 Excited to share that the Workshop on Mathematical Reasoning and AI (MATH‑AI) will be at NeurIPS 2025!
📅 Dec 6 or 7 (TBD), 2025
🌴 San Diego, California

7

41

216

45

23.0K

Kaiyu Yang Retweeted

P

Princeton Computer Science@PrincetonCS · Jul 23

⏱️AI is making verification process easier, with models verifying proofs in minutes. 💻 Now, @prfsanjeevarora, @chijinML, @danqi_chen and @PrincetonPLI have released Goedel Prover V2, a model more efficient and more accurate than any previous model. 👉 blog.goedel-prover.com

0

25

87

23

17.0K

Kaiyu Yang Retweeted

W

Wenda Li@WendaLi8 · Jul 23

Lovely to see the impressive performance of the Seed Prover developed by the ByteDance Seed team at IMO 2025 — achieving a silver-level score (30 out of 42) within three days, and reaching (35 out of 42) with extended compute time. leanprover.zulipchat.com/#narrow/channe…

2

25

74

24

6.0K

Kaiyu Yang Retweeted

A

Alex Kontorovich@AlexKontorovich · Jul 23

Another AI system, ByteDance's SeedProver solved 4 out of 6 IMO problems *with* Lean, and solved a fifth with extended compute. This is becoming routine, like when we went to the moon for the fourth time. There is *nothing* "routine" about this!!...

10

52

463

122

39.0K

K

Kaiyu Yang@KaiyuYang4 · Jul 22

While IMO is trending, our model leads on college-level math (Putnam Benchmark)—nearly doubling the problems solved by prior SOTA, with formal, verifiable proofs! Moreover, it’s not just an announcement—you can actually download and use our model. 🙂

YYong Lin@Yong18850571 · Jul 22

🔥Our Goedel-Prover-V2-32B topped the PutnamBench Leaderboard by solving 86 problems —nearly 2× more than the previous SOTA DeepSeek-Prover-V2-671B (solved 47), while using: * 1/20 the model size (32B vs. 671B) * 1/5 the passes (184 vs. 1024) Meanwhile, we also release *…

4

21

167

48

14.0K

Kaiyu Yang Retweeted

D

Demis Hassabis@demishassabis · Jul 21

Official results are in - Gemini achieved gold-medal level in the International Mathematical Olympiad! 🏆 An advanced version was able to solve 5 out of 6 problems. Incredible progress - huge congrats to @lmthang and the team! deepmind.google/discover/blog/…

201

765

6.0K

634

1.4M

Kaiyu Yang Retweeted

A

Alexander Wei@alexwei_ · Jul 19

1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

406

1.0K

7.0K

2.0K

5.3M

Kaiyu Yang Retweeted

A

AK@_akhaliq · Jul 16

Goedel-Prover-V2 The Strongest Open-Source Theorem Prover to Date

9

34

312

165

48.0K

K

Kaiyu Yang@KaiyuYang4 · Jul 16

SOTA on PutnamBench with a 32b model (and highly competitive 8b): Goedel team is not messing around. Unsurprisingly most of the performance gains rely on a better synthetic pipeline.

YYong Lin@Yong18850571 · Jul 15

(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B…

3

5

56

24

5.0K

K

Kaiyu Yang@KaiyuYang4 · Jul 15

Formal math taking off at @PrincetonPLI ! New Goedel-Prover v2 8B model matches 2.5 month old Deepseek V2 prover 671B, but is 80x smaller. Our 32B model much better on all benchmarks (miniF2F, IMO, Putnam). I'm excited/shocked by how much this field has advanced in 6-7 months…

YYong Lin@Yong18850571 · Jul 15

(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B…

0

16

91

29

8.0K

K

Kaiyu Yang@KaiyuYang4 · Jul 15

🚀 Huge milestone from our Goedel-Prover team: we’ve just released a new state-of-the-art model (8B & 32B) for automated theorem proving—surpassing the previous best 671B DeepSeek model by a wide margin, all with academic compute!

YYong Lin@Yong18850571 · Jul 15

(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B…

3

10

55

6

6.0K

K

Kaiyu Yang@KaiyuYang4 · Jul 15

Our Goedel-Prover-V2 doubled the SOTA Pass@32 performance on PutnamBench with a 20x smaller model, making it the strongest open-source theorem prover to date!

YYong Lin@Yong18850571 · Jul 15

(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B…

0

14

88

16

12.0K

K

Kaiyu Yang@KaiyuYang4 · Jul 9

So proud! Go work with Gabriel, he’ll be the best advisor

GGabriel Poesia@GabrielPoesia · Jul 8

Thrilled to join the UMich faculty in 2026! I'll also be recruiting PhD students this upcoming cycle. If you're interested in AI and formal reasoning, consider applying!

0

5

30

3

6.0K

Kaiyu Yang Retweeted

S

Swarat Chaudhuri@swarat · Jul 1

Passionate about frontier AI models, classical symbolic reasoning, and safe/secure software? Consider applying for this position on AI-aided code analysis in my team at @GoogleDeepmind: job-boards.greenhouse.io/deepmind/jobs/…. The job is London-based, and the application deadline is July 14.

5

16

41

13

3.0K

Kaiyu Yang Retweeted

D

Dawn Song@dawnsongtweets · Jun 18

1/ 🔥 AI agents are reaching a breakthrough moment in cybersecurity. In our latest work: 🔓 CyberGym: AI agents discovered 15 zero-days in major open-source projects 💰 BountyBench: AI agents solved real-world bug bounty tasks worth tens of thousands of dollars 🤖…

28

141

485

335

101.0K