Weizhu Chen

@WeizhuChen

Microsoft

Kirkland WA

Joined April 2008

224Following

3KFollowers

Pinned

Weizhu Chen@WeizhuChen · Apr 30

Happy to see @ypwang61 and the team did some interesting work here.

YYiping Wang@ypwang61 · Apr 30

We only need ONE example for RLVR on LLMs to achieve significant improvement on math tasks! 📍RLVR with one training example can boost: - Qwen2.5-Math-1.5B: 36.0% → 73.6% - Qwen2.5-Math-7B: 51.0% → 79.2% on MATH500. 📄 Paper: arxiv.org/abs/2504.20571…

1.0K

Weizhu Chen Retweeted

Kaiyu Yang@KaiyuYang4 · Jul 23

🚀 Excited to share that the Workshop on Mathematical Reasoning and AI (MATH‑AI) will be at NeurIPS 2025! 📅 Dec 6 or 7 (TBD), 2025 🌴 San Diego, California

217

24.0K

Weizhu Chen@WeizhuChen · Jul 19

See our work in the workshop today. If you are looking for opportunities to work on efficient model architecture or whatever to make the training or inference run much faster with thousands or more gpus, please come to talk to us or dm me. We are hiring.

LLiliang Ren@liliang_ren · Jul 18

We’re open-sourcing the pre-training code for Phi4-mini-Flash, our SoTA hybrid model that delivers 10× faster reasoning than Transformers — along with μP++, a suite of simple yet powerful scaling laws for stable large-scale training. 🔗 github.com/microsoft/Arch… (1/4)

5.0K

Weizhu Chen@WeizhuChen · Jul 10

You may check our work of Phi4-mini-flash-Reasoning. What I like the most is the Gated Memory Unit (GMU) design, which can be applied in future model design to achieve quality and long context, as well as the uP++. @liliang_ren

LLiliang Ren@liliang_ren · Jul 9

Reasoning can be made much, much faster—with fundamental changes in neural architecture. 😮 Introducing Phi4-mini-Flash-Reasoning: a 3.8B model that surpasses Phi4-mini-Reasoning on major reasoning tasks (AIME24/25, MATH500, GPQA-D), while delivering up-to 10× higher throughput…

2.0K

Weizhu Chen@WeizhuChen · Jun 12

Synthesizing challenging problems that current model performs poorly is an important area in RL. Another thing interests me is the self-evolve learning via synthesizing questions/problems that the model can learn continuously. You may check our work here:mastervito.github.io/MasterVito.SwS…

WeizhuChen's tweet image. Synthesizing challenging problems that current model performs poorly is an important area in RL. Another thing interests me is the self-evolve learning via synthesizing questions/problems that the model can learn continuously.
You may check our work here:mastervito.github.io/MasterVito.SwS…

4.0K

Weizhu Chen@WeizhuChen · May 1

Glad to see the team used a 3.8B model (Phi-4-mini-reasoning) to achieve 94.6 in Math-500 and 57.5 in AIME-24. arxiv: arxiv.org/pdf/2504.21233 hf: huggingface.co/microsoft/Phi-… Azure: aka.ms/phi4-mini-reas…

WeizhuChen's tweet image. Glad to see the team used a 3.8B model (Phi-4-mini-reasoning) to achieve 94.6 in Math-500 and 57.5 in AIME-24.
arxiv: arxiv.org/pdf/2504.21233
hf: huggingface.co/microsoft/Phi-…
Azure: aka.ms/phi4-mini-reas…

3.0K

Weizhu Chen@WeizhuChen · Mar 4

Check out our tech report of the phi4 mini and multimodality.

AAK@_akhaliq · Mar 4

Phi-4-Mini Technical Report Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

4.0K

Weizhu Chen@WeizhuChen · Feb 26

We released Phi-4-mini (3.8B base in LLM), a new SLM excelling in language, vision, and audio through a mixture-of-LoRA, uniting three modalities in one model. I am so impressed with its new audio capability. I hope you can play with it and share with us your feedback. We also…

WeizhuChen's tweet image. We released Phi-4-mini (3.8B base in LLM), a new SLM excelling in language, vision, and audio through a mixture-of-LoRA, uniting three modalities in one model. I am so impressed with its new audio capability. I hope you can play with it and share with us your feedback. We also…

144

735

324

88.0K

Weizhu Chen@WeizhuChen · Dec 16

+1 on this.

T@ ·

3.0K

Weizhu Chen Retweeted

Jeff Dean@JeffDean · Dec 14

I didn't see the talk, but the images I've seen of the slide seem quite offensive. Such generalizations should have no place in NeurIPS or anywhere else.

158

1.0K

122.0K