Fu-En (Fred) Yang
@FuEnYang1
Research Scientist @NVIDIAAI | Ph.D. @NTU_TW | Prev. Research Intern @NVIDIAAI | Vision & Language | Multimodal AI
🤖 How can we teach embodied agents to think before they act? 🚀 Introducing ThinkAct — a hierarchical Reasoning VLA framework with an MLLM for complex, slow reasoning and an action expert for fast, grounded execution. Slow think, fast act. 🧠⚡🤲

New paper introduces ThinkAct - a framework that teaches robots to reason before acting. It's like giving robots a moment to think through their next moves, just like we do. 🧵
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning Author's Explanation: x.com/FuEnYang1/stat… Overview: ThinkAct introduces a dual-system framework that generates embodied reasoning plans with a multimodal LLM guided by reinforced action-aligned…
🤖 How can we teach embodied agents to think before they act? 🚀 Introducing ThinkAct — a hierarchical Reasoning VLA framework with an MLLM for complex, slow reasoning and an action expert for fast, grounded execution. Slow think, fast act. 🧠⚡🤲
🚨This week's top AI/ML research papers: - GSPO - Diffusion Beats Autoregressive in Data-Constrained Settings - Gemini 2.5 Pro Capable of Winning Gold at IMO 2025 - Rubrics as Rewards - Deep Researcher with Test-Time Diffusion - Learning without training - Stabilizing Knowledge,…
Very excited to announce Llama-Nemotron-Super-V1.5! Super-V1.5 is now better than Ultra-V1. This is currently the best model that can be deployed on a single H100. Reasoning On/Off and drop in replacement for V1. Open-weight, code and data on HF huggingface.co/nvidia/Llama-3…
Thanks @_akhaliq for sharing our latest VLA Reasoning work! Please see more details here: x.com/FuEnYang1/stat… @NVIDIAAIDev @NVIDIAAI @nvidia #NVIDIA #NVIDIAResearch #VLA #reasoning #RL
Nvidia presents ThinkAct Vision-Language-Action Reasoning via Reinforced Visual Latent Planning
Nvidia presents ThinkAct Vision-Language-Action Reasoning via Reinforced Visual Latent Planning
Official results are in - Gemini achieved gold-medal level in the International Mathematical Olympiad! 🏆 An advanced version was able to solve 5 out of 6 problems. Incredible progress - huge congrats to @lmthang and the team! deepmind.google/discover/blog/…
And today we have just opened sourced the Eagle 2.5 model huggingface.co/nvidia/Eagle2.… You are welcome to download and give a try! We will also open source the fine-tuning code for Eagle 2/2.5 soon at github.com/NVlabs/Eagle. Stay tuned.
I did not notice this until just now. Thank you @andimarafioti for the recommendation! Very glad that even though Eagle 2 is not our latest work, people still find it very useful.
ChatGPT can now do work for you using its own computer. Introducing ChatGPT agent—a unified agentic system combining Operator’s action-taking remote browser, deep research’s web synthesis, and ChatGPT’s conversational strengths.
GenRecal Generation after Recalibration from Large to Small Vision-Language Models
Today we're excited to share a glimpse of what we're building at Generalist. As a first step towards our mission of making general-purpose robots a reality, we're pushing the frontiers of what end-to-end AI models can achieve in the real world. Here's a preview of our early…