Shaokun Zhang
@ShaokunZhang1
Agentic AI. PhD student @PennState. Co-Creator of #AutoGen | Research Intern @NvidiaAI @MSFTResearch
Tool-using LLMs can learn to reason—without reasoning traces. 🔥 We present Nemotron-Research-Tool-N1, a family of tool-using reasoning LLMs trained entirely via rule-based reinforcement learning—no reasoning supervision, no distillation. 📄 Paper: arxiv.org/pdf/2505.00024 💻…

🚀 Meet MassGen! 🛠️ An open-source project for multi-agent scaling. Inspired by @grok Heavy & Gemini DeepThink. Enable parallel intelligence sharing, iterative refinement & consensus across agents. @GoogleAI @OpenAI @xai MVP out now—star & feedback! 👇 github.com/Leezekun/MassG…
Document and Enterprise Intelligence is arguably one of the most important applications of VLMs and cloud services. NVIDIA VLM technologies help to build commercial grade models excelling in this area. The Eagle VLM Team, together with other colleagues at NVIDIA, are proud to be…
🥇Our NVIDIA Llama Nemotron Nano VL model is #1 on the OCRBench V2 leaderboard. Designed for advanced intelligent document processing and understanding, this model extracts diverse info from complex documents with precision, all on a single GPU. 📗 Get the technical details…
Confidential review is finally complete. Check it out here: github.com/NVlabs/Tool-N1
Tool-using LLMs can learn to reason—without reasoning traces. 🔥 We present Nemotron-Research-Tool-N1, a family of tool-using reasoning LLMs trained entirely via rule-based reinforcement learning—no reasoning supervision, no distillation. 📄 Paper: arxiv.org/pdf/2505.00024 💻…
Does RL truly expand a model’s reasoning🧠capabilities? Contrary to recent claims, the answer is yes—if you push RL training long enough! Introducing ProRL 😎, a novel training recipe that scales RL to >2k steps, empowering the world’s leading 1.5B reasoning model💥and offering…
Excited to join the College of Computing and Data Science at Nanyang Technological University, Singapore (@NTUsg) as an Assistant Professor this fall! 🙌 Grateful to my advisor @SharonYixuanLi and all who supported me along the way. Looking forward to the new chapter! 😄 🇸🇬
Fast-dLLM Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding
Our paper is accepted to ICML! #ICML2025🙌
Does anyone want to dig deeper into the robustness of Multimodal LLMs (MLLMs) beyond empirical observations Happy to serve this exactly through our new #ICML2025 paper "Understanding Multimodal LLMs Under Distribution Shifts: An Information-Theoretic Approach"!
「Nvidia, Reasoning, Agent」 Nemotron-Research-Tool-N1: Exploring Tool-Using Language Models with Reinforced Reasoning Nemotron-Tool-N1 是 RLVR 思想在 tool calling 维度的深化版本,让 7 B / 14 B 模型在工具类基准全面领先 GPT-4o。精彩的工作!…
Appreciate the repost! 🙌
6. Nemotron-Research-Tool-N1 Introduces Tool-N1, a family of tool-using LLMs trained using a rule-based reinforcement learning (R1-style RL) approach, without reliance on supervised reasoning trajectories. x.com/ShaokunZhang1/…
🚨 We discovered a surprising side effect of Reinforcement Finetuning (RFT): it makes LLMs more confidently wrong on unanswerable questions. We call this the hallucination tax: a drop in refusal behavior that leads to overconfident hallucinations. 🧵 1/n
Cool paper from @nvidia Prior methods for training LLMs for tool use rely on imitation or distilled reasoning, limiting generalization. Nemotron-Research-Tool-N1 uses rule-based reinforcement learning. It trains models with binary rewards evaluating only tool call structure…
𝗧𝗼𝗽 𝘀𝗲𝗰𝗿𝗲𝘁 𝗔𝗴𝗲𝗻𝘁-𝗮𝘀-𝗮-𝗝𝘂𝗱𝗴𝗲 can be a great open-source #DeepWiki by just adding 2 code files swap github → openwiki on any repo URL 🫱 github.com/metauto-ai/age…
More results will be released soon. Stay tuned😀
I am a big believer in this line of research on "tool enhanced" LLMs. Most notably, here, the final RL tuning only uses a format checking reward. 🔗 arxiv.org/abs/2505.00024
265 pages of everything you need to know about building AI agents. 5 things that stood out to me about this report:
🔥 New talk announcement - Frontiers of LLM Agents: Memory, Tool Use, Multi-Modal Input, and RL with LLMs 🔥 What happens when LLM agents are designed to learn, adapt, and make decisions over time? @samianholt, a PhD researcher with nine first-author papers across NeurIPS,…