Yanheng He
@YanhengHe
Undergrad @sjtu1896 ACM Class. Intern @NYU_Courant
🔥 Excited to share our work "Efficient Agent Training for Computer Use" Q: Do computer use agents need massive data or complex RL to excel? A: No, with just 312 high-quality trajectories, Qwen2.5-VL can outperform Claude 3.7, setting a new SOTA for Windows computer use. 1/6

1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
Amazing step toward digital superintelligence!
ChatGPT agent is ready to introduce itself. openai.com/live
Thanks for bringing this to my attention. I honestly wasn’t aware of the situation until the recent posts started going viral. I would never encourage my students to do anything like this—if I were serving as an Area Chair, any paper with this kind of prompt would be…
The race for LLM "cognitive core" - a few billion param model that maximally sacrifices encyclopedic knowledge for capability. It lives always-on and by default on every computer as the kernel of LLM personal computing. Its features are slowly crystalizing: - Natively multimodal…
I’m so excited to announce Gemma 3n is here! 🎉 🔊Multimodal (text/audio/image/video) understanding 🤯Runs with as little as 2GB of RAM 🏆First model under 10B with @lmarena_ai score of 1300+ Available now on @huggingface, @kaggle, llama.cpp, ai.dev, and more
Had a great time at this CVPR community-building workshop---lots of fun discussions and some really important insights for early-career researchers. I also gave a talk on "Research as an Infinite Game." Here are the slides: canva.com/design/DAGp0iR…
In this #CVPR2025 edition of our community-building workshop series, we focus on supporting the growth of early-career researchers. Join us tomorrow (Jun 11) at 12:45 PM in Room 209 Schedule: sites.google.com/view/standoutc… We have an exciting lineup of invited talks and candid…
📣 New Discovery on Computer Use Agent With just 312 high-quality trajectories + open-source model, we've surpassed Claude 3.7 Sonnet (thinking) in computer use capabilities 🚀 ⚡️ In the new era of AI Agent training, many key questions remain: • Can open-source models + small…
Excited to share PC Agent-E, our new work on efficient agent training for computer use! Trained with only❗️312 human trajectories enhanced by Claude 3.7 Sonnet, PC Agent-E achieves a 🤯 141% relative improvement, even surpasses Claude 3.7 Sonnet (thinking)!
312 quality trajectories + open-source model beats Claude 3.7 Sonnet (thinking) in computer use 🚀 We answer the following important questions in our recent tech report: github.com/GAIR-NLP/PC-Ag… 1. Can open-source models + small high-quality datasets outperform top closed-source…
🔥 Excited to share our work "Efficient Agent Training for Computer Use" Q: Do computer use agents need massive data or complex RL to excel? A: No, with just 312 high-quality trajectories, Qwen2.5-VL can outperform Claude 3.7, setting a new SOTA for Windows computer use. 1/6
Excited to share PC Agent-E, our new work on efficient agent training for computer use! Trained with only❗️312 human trajectories enhanced by Claude 3.7 Sonnet, PC Agent-E achieves a 🤯 141% relative improvement, even surpasses Claude 3.7 Sonnet (thinking)!
🔥 Excited to share our work "LIMR: Less is More for RL Scaling" Q: What determines the effectiveness of RL training data ? A: Alignment with model's learning journey 1,389 strategic samples ≥ 8,523 full dataset 🤯 📄: github.com/GAIR-NLP/LIMR/… 💻: github.com/GAIR-NLP/LIMR 1/6
🤔 What makes 3D LLMs truly unique compared to 2D VLMs? 🕵️♂️ We uncovered the "2D-Cheating" problem in 3D LLM evaluation: Many tasks can be easily solved by 2D VLMs using rendered images, failing to test true 3D capabilities!
🤔 Struggling to train capable AI agents due to lack of quality data? 🚀 Meet PC Tracker & PC Agent - our groundbreaking system that learns from real human computer operation process to handle complex digital work! Watch how PC Agent automatically creates slides about Attention…