Zhanpeng Zhou

@zhanpeng_zhou

Ph.D. candidate @sjtu1896 | Exploring the theoretical foundations of deep learning.

Shanghai

Joined November 2020

382Following

273Followers

Zhanpeng Zhou Retweeted

Jiahao Qiu@JiahaoQiu99 · May 30

It is interesting to find that the second day after I posted the tweet introducing Alita, the GAIA leaderboard validation was removed. GAIA Leaderboard: huggingface.co/spaces/gaia-be… RIP🕯️🕯️🕯️

613

Zhanpeng Zhou Retweeted

Jiahao Qiu@JiahaoQiu99 · May 27

The GAIA game is over, and Alita is the final answer. Alita takes the top spot in GAIA, outperforming OpenAI Deep Research and Manus. Many general-purpose agents rely heavily on large-scale, manually predefined tools and workflows. However, we believe that for general AI…

23.0K

Zhanpeng Zhou Retweeted

Wei Huang@WeiHuang_USTC · May 22

Excited to share our new work: “Scaling Diffusion Transformers Efficiently via μP” Grateful for the amazing co-authors! 📄 Paper: arxiv.org/abs/2505.15270 💻 Code: github.com/ML-GSAI/Scalin… #DiffusionModels #Transformers #muP #AIresearch #scaling #MachineLearning #GenerativeAI

206

Zhanpeng Zhou Retweeted

Taiji Suzuki@btreetaiji · May 12

RIKAN AIP: 29 papers have been accepted at ICML 2025 aip.riken.jp/news/icml2025/…

8.0K

Zhanpeng Zhou@zhanpeng_zhou · May 4

This looks cool. A principled blockwise learning rate design to offset the block-heterogeneity in Transformers' Hessian. The authors observed 2x speed up in LLM pre-training. Sounds very reasonable and promising to me😀

ZZhanpeng Zhou@zhanpeng_zhou · May 2

Two papers get accepted by #ICML2025 🥳🥳 [1/2] We discover that different blocks in Transformers exhibit notable disparity in Sharpness. Then we propose Blockwise LR, accelerating large language model (LLM) pre-training (~2x speedup). arxiv.org/abs/2502.19002

615

Zhanpeng Zhou@zhanpeng_zhou · May 2

Our paper studying dynamics of a simple Markov model for CoT reasoning has been accepted to #ICML2025 ! (reposting a nice summary↓)

AArmielyn Obinguar@Aeriumcius · Feb 6

"Metastable Dynamics of Chain-of-Thought Reasoning" (Kim et al., 2025) breaks down how to improve LLM reasoning using search, reinforcement learning (RL), and distillation. Key takeaways: 🔍 Search identifies critical (hard) reasoning steps, reducing the number of steps needed…

7.0K

Zhanpeng Zhou@zhanpeng_zhou · Apr 22

First day of #ICLR2025, I will present "Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late In Training." (Spotlight!) We found that even few epochs of SAM applied at the end of training could significantly improve the generalization. See you on 24th!

zhanpeng_zhou's tweet image. First day of #ICLR2025, I will present "Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late In Training." (Spotlight!)

We found that even few epochs of SAM applied at the end of training could significantly improve the generalization.

See you on 24th!

713

Zhanpeng Zhou Retweeted

Yongyi Yang@YongyiYang7 · Apr 10

We are excited to introduce our new paper RaanA: A Fast, Flexible, and Data-Efficient Post-Training Quantization Algorithm. RaanA is a novel PTQ Algorithm that is computationally efficient, calibration-light, and adaptable to diverse deployment scenarios. 🧵 (1/6)

3.0K

Zhanpeng Zhou Retweeted

Tongtian Zhu@Tongtian_Zhu · Apr 3

ICML 2025's rebuttal process be like🤣: 👨‍💻 Authors: spend a whole week writing a careful rebuttal ✅ Reviewer: clicks "acknowledge" without reading 🚫 Author: not allowed to reply anymore So what does acknowledge mean here? "You speak. I pretend to listen. Conversation over."🙃

296

27.0K

Zhanpeng Zhou Retweeted

Mahito Sugiyama@hito_maro · Mar 25

私の研究室所属の加納龍一さん（@ryuichi_74）が博士号を取得し、SOKENDAI賞を受賞されました！おめでとうございます！ soken.ac.jp/news/2024/2025… また、加納さんの最新の研究である Tree Ensamble での LMC 達成は、ICLR2025にspotlight発表として採択されています！ openreview.net/forum?id=UqYNP…

2.0K

Zhanpeng Zhou@zhanpeng_zhou · Feb 25

The code is released at github.com/zzp1012/SAM-in…. Reproduce our results if you are interested! In addition, we create some slides for better illustrations. Check out our slides at zzp1012.github.io/data/talks/SAM…

ZZhanpeng Zhou@zhanpeng_zhou · Jan 24

#ICLR2025 Two papers get accepted!🎉 (1/2) openreview.net/forum?id=aD2uw… We study the implicit bias of SAM during the late phase of training, revealing that SAM efficiently selects flatter minima over SGD even when applied in the last few epochs.

673

Zhanpeng Zhou Retweeted

Google AI@GoogleAI · Feb 19

Today we introduce an AI co-scientist system, designed to go beyond deep research tools to aid scientists in generating novel hypotheses & research strategies. Learn more, including how to join the Trusted Tester Program, at goo.gle/417wJrA

380

1.0K

6.0K

3.0K

1.4M

Zhanpeng Zhou Retweeted

Wei Huang@WeiHuang_USTC · Feb 11

🎉 Thrilled that our paper "On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent" is a Spotlight at #ICLR2025! Huge thanks to my collaborators & reviewers! Excited to discuss at the conference! 📄 Paper: openreview.net/forum?id=97rOQ…

7.0K

Zhanpeng Zhou@zhanpeng_zhou · Feb 11

Two papers are both selected as Spotlights by #ICLR 2025 committee! 🥳 Really appreciate the acknowledgement from my AC and reviewers.

ZZhanpeng Zhou@zhanpeng_zhou · Jan 24

2.0K

Zhanpeng Zhou Retweeted

Math Cafe@Riazi_Cafe_en · Jan 26

UC Berkeley's "Introduction to Mathematical Thinking" Lecture videos & slides: imt-decal.org

291

2.0K

134.0K