Yu Gu
@yugu_nlp
AI researcher. Building next-generation agents. I'm staying off Twitter lately (maybe forever).
“What's the role of NLP/LLM researchers in agent research?” “Natural language is merely a tool for communication.” … These doubts and criticisms have circulated widely over the past two years. In my PhD dissertation, I want to provide a perspective that addresses these doubts…
question to RL people: why the reward in RL has to be numerical? is it a design by nature? or is it mainly an expedient design to simplify the model? eager to learn about your opinions
🔎Agentic search like Deep Research is fundamentally changing web search, but it also brings an evaluation crisis⚠️ Introducing Mind2Web 2: Evaluating Agentic Search with Agents-as-a-Judge - 130 tasks (each requiring avg. 100+ webpages) from 1,000+ hours of expert labor -…
🤯What we know about RL for reasoning might not hold outside math and code? We revisit established findings on RL for LLM reasoning on six domains (Math, Code, Science, Logic, Simulation, Tabular) and found that previous conclusions drawn on math and code are surprisingly…
Zihao has been aware of this for three months. There's a more generalized claim underlying recent assertions about RL (in the context of using Qwen for math) like you can do RL with one example/internal rewards/spurious rewards: for Qwen on math, you just don't need RL at all!…
Recently, I saw the papers "rl on one sample" and "spurious reward". The findings are interesting, but they are indeed expected. In fact, the math solving ability of the Qwen models is really easy to activate—𝐞𝐯𝐞𝐧 𝐰𝐢𝐭𝐡𝐨𝐮𝐭 𝐚𝐧𝐲 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 !🤣 I'd like to share…
⁉️Can you really trust Computer-Use Agents (CUAs) to control your computer⁉️ Not yet, @AnthropicAI Opus 4 shows an alarming 48% Attack Success Rate against realistic internet injection❗️ Introducing RedTeamCUA: realistic, interactive, and controlled sandbox environments for…
🚀 Thrilled to unveil the most exciting project of my PhD: Explorer — Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents TL;DR: A scalable multi-agent pipeline that leverages exploration for diverse web agent trajectory synthesis. 📄 Paper:…
Tired of editing methods that require training, handcrafted subjects, or external memory? 🚀 #UltraEdit — Training-, subject-, and memory-free, for Lifelong Model Editing Compare to the prior best ✅New SOTA on 4 datasets and 6 models 🏎️7× faster – 20K samples within 5 mins on a…
Good scientists have a deep psychological need for crisp definitions and self-consistent models of the world. Most people are comfortable holding worldviews that are fragmented, inconsistent, and fluid, where one's degree of belief in various things fluctuates based on context
Project DeepWiki Up-to-date documentation you can talk to, for every repo in the world. Think Deep Research for GitHub – powered by Devin. It’s free for open-source, no sign-up! Visit deepwiki com or just swap github → deepwiki on any repo URL:
🎉 Announcing the first Open Science for Foundation Models (SCI-FM) Workshop at #ICLR2025! Join us in advancing transparency and reproducibility in AI through open foundation models. 🤝 Looking to contribute? Join our Program Committee: bit.ly/4acBBjF 🔍 Learn more at:…
Almost all my knowledge about robotics comes from Luke!
🚨We just released the data generation code for RoboSpatial! 💾 github.com/NVlabs/RoboSpa… 📢 And yes, RoboSpatial is a #CVPR2025 Oral 🏆🔥
Will be at ICLR from April 24-28. Can't wait to see my old/new friends! Also, please reach out if you wanna discuss anything about research in agents! #ICLR2025
“What's the role of NLP/LLM researchers in agent research?” “Natural language is merely a tool for communication.” … These doubts and criticisms have circulated widely over the past two years. In my PhD dissertation, I want to provide a perspective that addresses these doubts…