Zhanpeng Zhou
@zhanpeng_zhou
Ph.D. candidate @sjtu1896 | Exploring the theoretical foundations of deep learning.
It is interesting to find that the second day after I posted the tweet introducing Alita, the GAIA leaderboard validation was removed. GAIA Leaderboard: huggingface.co/spaces/gaia-be… RIP🕯️🕯️🕯️
The GAIA game is over, and Alita is the final answer. Alita takes the top spot in GAIA, outperforming OpenAI Deep Research and Manus. Many general-purpose agents rely heavily on large-scale, manually predefined tools and workflows. However, we believe that for general AI…
Excited to share our new work: “Scaling Diffusion Transformers Efficiently via μP” Grateful for the amazing co-authors! 📄 Paper: arxiv.org/abs/2505.15270 💻 Code: github.com/ML-GSAI/Scalin… #DiffusionModels #Transformers #muP #AIresearch #scaling #MachineLearning #GenerativeAI
RIKAN AIP: 29 papers have been accepted at ICML 2025 aip.riken.jp/news/icml2025/…
This looks cool. A principled blockwise learning rate design to offset the block-heterogeneity in Transformers' Hessian. The authors observed 2x speed up in LLM pre-training. Sounds very reasonable and promising to me😀
Two papers get accepted by #ICML2025 🥳🥳 [1/2] We discover that different blocks in Transformers exhibit notable disparity in Sharpness. Then we propose Blockwise LR, accelerating large language model (LLM) pre-training (~2x speedup). arxiv.org/abs/2502.19002
Our paper studying dynamics of a simple Markov model for CoT reasoning has been accepted to #ICML2025 ! (reposting a nice summary↓)
"Metastable Dynamics of Chain-of-Thought Reasoning" (Kim et al., 2025) breaks down how to improve LLM reasoning using search, reinforcement learning (RL), and distillation. Key takeaways: 🔍 Search identifies critical (hard) reasoning steps, reducing the number of steps needed…
First day of #ICLR2025, I will present "Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late In Training." (Spotlight!) We found that even few epochs of SAM applied at the end of training could significantly improve the generalization. See you on 24th!

We are excited to introduce our new paper RaanA: A Fast, Flexible, and Data-Efficient Post-Training Quantization Algorithm. RaanA is a novel PTQ Algorithm that is computationally efficient, calibration-light, and adaptable to diverse deployment scenarios. 🧵 (1/6)
ICML 2025's rebuttal process be like🤣: 👨💻 Authors: spend a whole week writing a careful rebuttal ✅ Reviewer: clicks "acknowledge" without reading 🚫 Author: not allowed to reply anymore So what does acknowledge mean here? "You speak. I pretend to listen. Conversation over."🙃
私の研究室所属の加納龍一さん(@ryuichi_74)が博士号を取得し、SOKENDAI賞を受賞されました!おめでとうございます! soken.ac.jp/news/2024/2025… また、加納さんの最新の研究である Tree Ensamble での LMC 達成は、ICLR2025にspotlight発表として採択されています! openreview.net/forum?id=UqYNP…
The code is released at github.com/zzp1012/SAM-in…. Reproduce our results if you are interested! In addition, we create some slides for better illustrations. Check out our slides at zzp1012.github.io/data/talks/SAM…
#ICLR2025 Two papers get accepted!🎉 (1/2) openreview.net/forum?id=aD2uw… We study the implicit bias of SAM during the late phase of training, revealing that SAM efficiently selects flatter minima over SGD even when applied in the last few epochs.
Today we introduce an AI co-scientist system, designed to go beyond deep research tools to aid scientists in generating novel hypotheses & research strategies. Learn more, including how to join the Trusted Tester Program, at goo.gle/417wJrA
🎉 Thrilled that our paper "On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent" is a Spotlight at #ICLR2025! Huge thanks to my collaborators & reviewers! Excited to discuss at the conference! 📄 Paper: openreview.net/forum?id=97rOQ…
Two papers are both selected as Spotlights by #ICLR 2025 committee! 🥳 Really appreciate the acknowledgement from my AC and reviewers.
#ICLR2025 Two papers get accepted!🎉 (1/2) openreview.net/forum?id=aD2uw… We study the implicit bias of SAM during the late phase of training, revealing that SAM efficiently selects flatter minima over SGD even when applied in the last few epochs.
UC Berkeley's "Introduction to Mathematical Thinking" Lecture videos & slides: imt-decal.org