Marco Dos Santos
@dsantosmarco
PhD student at the University of Cambridge, working on AI for formal mathematics. Core contributor to Kimina-Prover, Llemma and OpenWebMath.
Very proud of our new model, Kimina-Prover Preview! It’s the first large reasoning model for theorem proving, and achieves a SOTA on miniF2F (80%). I strongly believe in RL for formal mathematics. Here is why. 🧵
We believe formal math is the future. 🔥Introducing Kimina-Prover Preview, a Numina & @Kimi_Moonshot collaboration, the first large formal reasoning model for Lean 4, achieving 80.78% miniF2F. github.com/MoonshotAI/Kim…
1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
Happy to introduce Kimina-Prover-72B ! Reaching 92.2% on miniF2F using Test time RL. It can solve IMO problems using more than 500 lines of Lean 4 code ! Check our blog post here: huggingface.co/blog/AI-MO/kim… And play with our demo ! demo.projectnumina.ai
New milestone for Project Numina and Kimi Moonshot! 🚀 We are open sourcing our KiminaProver-72B. This SotA theorem-proving model comes with Test-Time Reinforcement Learning Search and Error-Fixing Capability. We’re putting it to the test soon, with the IMO just around the corner…
Hello World! 👋 We're thrilled to officially launch the X account for Numina, dedicated to advancing frontier AI in mathematics. Stay tuned for updates on our research, achievements, and the future of mathematical AI! #AI4Math #FormalMath #LeanProver #AutomatedReasoning…
Glad to share what we've been working for the past 4 months. We've got some sweet RL stack and two nice reasoning models. Try it out at chat.mistral.ai, select think/pure thinking.
Announcing Magistral, our first reasoning model designed to excel in domain-specific, transparent, and multilingual reasoning.
Congrats to the @deepseek_ai team for pushing the SOTA on miniF2F to 89%! It’s exciting to see the long CoT approach, which we first applied to theorem proving with Kimina-Prover, being explored independently, with interesting differences. Formal math is more popular than ever!
We just released DeepSeek-Prover V2. - Solves nearly 90% of miniF2F problems - Significantly improves the SoTA performance on the PutnamBench - Achieves a non-trivial pass rate on AIME 24 & 25 problems in their formal version Github: github.com/deepseek-ai/De…
Combinatorics are the two last problems unsolved by AlphaProof at last year's IMO。 Introducing CombiBench @Kimi_Moonshot , a benchmark focusing on combinatorics problems ! 🔥 🏆moonshotai.github.io/CombiBench/ 📘Dataset -> huggingface.co/datasets/AI-MO…
Been working hard pushing Grok 3 Mini reasoning capabilities to the performance/price frontier 🚀 Join our reasoning team to help us build even smarter models!
Meet the Grok 3 family, now on our API! Grok 3 Mini outperforms reasoning models at 5x lower cost, redefining cost-efficient intelligence. Grok 3, the world's strongest non-reasoning model, excels in tasks that need real world knowledge like law, finance, and healthcare.
Just released all the correct proofs and full thinking traces from Kimina-Prover Preview! 🧠📜 Explore them on GitHub: 🔗 github.com/MoonshotAI/Kim… Also, our arXiv preprint is live! If you find it helpful, consider citing us 🙏 📄 arxiv.org/abs/2504.11354
We believe formal math is the future. 🔥Introducing Kimina-Prover Preview, a Numina & @Kimi_Moonshot collaboration, the first large formal reasoning model for Lean 4, achieving 80.78% miniF2F. github.com/MoonshotAI/Kim…
🚀 Thrilled to be a core contributor to Kimina-Prover. Huge thanks to the Numina & Moonshot teams! @Kimi_Moonshot This has been the most exciting project I've worked on since entering the field. A minimal, flexible, and powerful approach that unifies ATP ideas like no other.
We believe formal math is the future. 🔥Introducing Kimina-Prover Preview, a Numina & @Kimi_Moonshot collaboration, the first large formal reasoning model for Lean 4, achieving 80.78% miniF2F. github.com/MoonshotAI/Kim…