Shaokun Zhang

@ShaokunZhang1

Agentic AI. PhD student @PennState. Co-Creator of #AutoGen | Research Intern @NvidiaAI @MSFTResearch

Joined April 2021

642Following

493Followers

Pinned

Shaokun Zhang@ShaokunZhang1 · May 13

Tool-using LLMs can learn to reason—without reasoning traces. 🔥 We present Nemotron-Research-Tool-N1, a family of tool-using reasoning LLMs trained entirely via rule-based reinforcement learning—no reasoning supervision, no distillation. 📄 Paper: arxiv.org/pdf/2505.00024 💻…

ShaokunZhang1's tweet image. Tool-using LLMs can learn to reason—without reasoning traces.

🔥 We present Nemotron-Research-Tool-N1, a family of tool-using reasoning LLMs trained entirely via rule-based reinforcement learning—no reasoning supervision, no distillation.

📄 Paper: arxiv.org/pdf/2505.00024
💻…

355

287

34.0K

Shaokun Zhang Retweeted

Chi Wang@Chi_Wang_ · Jul 25

🚀 Meet MassGen! 🛠️ An open-source project for multi-agent scaling. Inspired by @grok Heavy & Gemini DeepThink. Enable parallel intelligence sharing, iterative refinement & consensus across agents. @GoogleAI @OpenAI @xai MVP out now—star & feedback! 👇 github.com/Leezekun/MassG…

110

10.0K

Shaokun Zhang@ShaokunZhang1 · Jun 4

Document and Enterprise Intelligence is arguably one of the most important applications of VLMs and cloud services. NVIDIA VLM technologies help to build commercial grade models excelling in this area. The Eagle VLM Team, together with other colleagues at NVIDIA, are proud to be…

NNVIDIA AI Developer@NVIDIAAIDev · Jun 3

🥇Our NVIDIA Llama Nemotron Nano VL model is #1 on the OCRBench V2 leaderboard. Designed for advanced intelligent document processing and understanding, this model extracts diverse info from complex documents with precision, all on a single GPU. 📗 Get the technical details…

2.0K

Shaokun Zhang@ShaokunZhang1 · Jun 2

Confidential review is finally complete. Check it out here: github.com/NVlabs/Tool-N1

SShaokun Zhang@ShaokunZhang1 · May 13

2.0K

Shaokun Zhang Retweeted

Shizhe Diao@shizhediao · Jun 2

Does RL truly expand a model’s reasoning🧠capabilities? Contrary to recent claims, the answer is yes—if you push RL training long enough! Introducing ProRL 😎, a novel training recipe that scales RL to >2k steps, empowering the world’s leading 1.5B reasoning model💥and offering…

408

366

54.0K

Shaokun Zhang Retweeted

Sean Xuefeng Du@xuefeng_du · Jun 2

Excited to join the College of Computing and Data Science at Nanyang Technological University, Singapore (@NTUsg) as an Assistant Professor this fall! 🙌 Grateful to my advisor @SharonYixuanLi and all who supported me along the way. Looking forward to the new chapter! 😄 🇸🇬

288

24.0K

Shaokun Zhang Retweeted

AK@_akhaliq · May 30

Fast-dLLM Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding

180

20.0K

Shaokun Zhang Retweeted

Jiayi Zhang@didiforx · May 28

Any thoughts on this agent evolution path? 👀

4.0K

Shaokun Zhang@ShaokunZhang1 · May 27

Our paper is accepted to ICML! #ICML2025🙌

CChangdae Oh@Changdae_Oh · May 27

Does anyone want to dig deeper into the robustness of Multimodal LLMs (MLLMs) beyond empirical observations Happy to serve this exactly through our new #ICML2025 paper "Understanding Multimodal LLMs Under Distribution Shifts: An Information-Theoretic Approach"!

4.0K

Shaokun Zhang Retweeted

马

马东锡 NLP@dongxi_nlp · May 13

「Nvidia, Reasoning, Agent」 Nemotron-Research-Tool-N1: Exploring Tool-Using Language Models with Reinforced Reasoning Nemotron-Tool-N1 是 RLVR 思想在 tool calling 维度的深化版本，让 7 B / 14 B 模型在工具类基准全面领先 GPT-4o。精彩的工作！…

3.0K

Shaokun Zhang@ShaokunZhang1 · May 24

Appreciate the repost! 🙌

DDAIR.AI@dair_ai · May 18

6. Nemotron-Research-Tool-N1 Introduces Tool-N1, a family of tool-using LLMs trained using a rule-based reinforcement learning (R1-style RL) approach, without reliance on supervised reasoning trajectories. x.com/ShaokunZhang1/…

173

Shaokun Zhang Retweeted

Linxin Song@linxins2 · May 21

🚨 We discovered a surprising side effect of Reinforcement Finetuning (RFT): it makes LLMs more confidently wrong on unanswerable questions. We call this the hallucination tax: a drop in refusal behavior that leads to overconfident hallucinations. 🧵 1/n

258

250

28.0K

Shaokun Zhang Retweeted

Rohan Paul@rohanpaul_ai · May 10

Cool paper from @nvidia Prior methods for training LLMs for tool use rely on imitation or distilled reasoning, limiting generalization. Nemotron-Research-Tool-N1 uses rule-based reinforcement learning. It trains models with binary rewards evaluating only tool call structure…

199

140

15.0K

Shaokun Zhang Retweeted

Mingchen Zhuge@MingchenZhuge · May 4

𝗧𝗼𝗽 𝘀𝗲𝗰𝗿𝗲𝘁 𝗔𝗴𝗲𝗻𝘁-𝗮𝘀-𝗮-𝗝𝘂𝗱𝗴𝗲 can be a great open-source #DeepWiki by just adding 2 code files swap github → openwiki on any repo URL 🫱 github.com/metauto-ai/age…

116

11.0K

Shaokun Zhang@ShaokunZhang1 · May 3

More results will be released soon. Stay tuned😀

FFrancesco Bertolotti@f14bertolotti · May 2

I am a big believer in this line of research on "tool enhanced" LLMs. Most notably, here, the final RL tuning only uses a format checking reward. 🔗 arxiv.org/abs/2505.00024

646

Shaokun Zhang Retweeted

elvis@omarsar0 · Apr 27

265 pages of everything you need to know about building AI agents. 5 things that stood out to me about this report:

428

2.0K

4.0K

279.0K

Shaokun Zhang Retweeted

AG2@ag2oss · Apr 23

🔥 New talk announcement - Frontiers of LLM Agents: Memory, Tool Use, Multi-Modal Input, and RL with LLMs 🔥 What happens when LLM agents are designed to learn, adapt, and make decisions over time? @samianholt, a PhD researcher with nine first-author papers across NeurIPS,…

1.0K