Han Guo
@HanGuo97
PhD Student @MIT_CSAIL | Past: @LTIatCMU @MITIBMLab @UNCNLP, @SFResearch, @BaiduResearch | Machine Learning, NLP.
Quick thread on the recent IMO results and the relationship between symbol manipulation, reasoning, and intelligence in machines and humans:
👉 New preprint! Today, many the biggest challenges in LM post-training aren't just about correctness, but rather consistency & coherence across interactions. This paper tackles some of these issues by optimizing reasoning LMs for calibration rather than accuracy...
🚨New Paper!🚨 We trained reasoning LLMs to reason about what they don't know. o1-style reasoning training improves accuracy but produces overconfident models that hallucinate more. Meet RLCR: a simple RL method that trains LLMs to reason and reflect on their uncertainty --…
Life update: I’m excited to share that I’ll be starting as faculty at the Max Planck Institute for Software Systems(@mpi_sws_) this Fall!🎉 I’ll be recruiting PhD students in the upcoming cycle, as well as research interns throughout the year: lasharavichander.github.io/contact.html
🚨New Paper!🚨 We trained reasoning LLMs to reason about what they don't know. o1-style reasoning training improves accuracy but produces overconfident models that hallucinate more. Meet RLCR: a simple RL method that trains LLMs to reason and reflect on their uncertainty --…
fun new paper training LLMs to analyze their own uncertainty and be more calibrated in their confidence! arxiv.org/abs/2507.16806
🚨New Paper!🚨 We trained reasoning LLMs to reason about what they don't know. o1-style reasoning training improves accuracy but produces overconfident models that hallucinate more. Meet RLCR: a simple RL method that trains LLMs to reason and reflect on their uncertainty --…
In our continued commitment to open-science, we are releasing the Voxtral Technical Report: arxiv.org/abs/2507.13264 The report covers details on pre-training, post-training, alignment and evaluations. We also present analysis on selecting the optimal model architecture, which…
🥳 Gap year update: I'll be joining @allen_ai/@UW for 1 year (Sep2025-Jul2026 -> @JHUCompSci) & looking forward to working with amazing folks there, incl. @RanjayKrishna, @HannaHajishirzi, Ali Farhadi. 🚨 I’ll also be recruiting PhD students for my group at @JHUCompSci for Fall…
Sharing some personal updates 🥳: - I've completed my PhD at @unccs! 🎓 - Starting Fall 2026, I'll be joining the Computer Science dept. at Johns Hopkins University (@JHUCompSci) as an Assistant Professor 💙 - Currently exploring options + finalizing the plan for my gap year (Aug…
🚨 The era of infinite internet data is ending, So we ask: 👉 What’s the right generative modelling objective when data—not compute—is the bottleneck? TL;DR: ▶️Compute-constrained? Train Autoregressive models ▶️Data-constrained? Train Diffusion models Get ready for 🤿 1/n
An advanced version of Gemini with Deep Think has officially achieved gold medal-level performance at the International Mathematical Olympiad. 🥇 It solved 5️⃣ out of 6️⃣ exceptionally difficult problems, involving algebra, combinatorics, geometry and number theory. Here’s how 🧵
developer.nvidia.com/blog/cutlass-p… marks the start of a short series of blogposts about CUTLASS 3.x and CuTe that we've been meaning to write for years. There are a few more parts to come still, hope you enjoy!
In this report we describe the 2025 Apple Foundation Models ("AFM"). We also introduce the new Foundation Models framework, which gives app developers direct access to the on-device AFM model. machinelearning.apple.com/research/apple…
We’re open-sourcing the pre-training code for Phi4-mini-Flash, our SoTA hybrid model that delivers 10× faster reasoning than Transformers — along with μP++, a suite of simple yet powerful scaling laws for stable large-scale training. 🔗 github.com/microsoft/Arch… (1/4)
Excited to share the agent with the world! It’s a good agent!
ChatGPT can now do work for you using its own computer. Introducing ChatGPT agent—a unified agentic system combining Operator’s action-taking remote browser, deep research’s web synthesis, and ChatGPT’s conversational strengths.
Presenting our ICML spotlight poster at today 11am @ E-606 w/ @jyo_pari! We need to fundamentally change how we train to achieve true reasoning. Reward-based Pretraining (RPT) > Supervised Pretraining
🧙♂️Excited to share our new whitepaper “General Reasoning Requires Learning to Reason from the Get-Go.” We argue that simply making models bigger and feeding them more data is NOT enough for robust, adaptable reasoning. (1/n)
Introducing the world's best (and open) speech recognition models!
Laker and I are presenting this work in an hour at ICML poster E-2103. It’s on a theoretical framework and language (modula) for optimizers that are fast (like Shampoo) and scalable (like muP). You can think of modula as Muon extended to general layer types and network topologies