Peng (Richard) Xia
@richardxp888
PhD Student @UNC @unccs @unc_ai_group | Formerly @MSFTResearch @MonashUni | Multimodal, Agent, RAG, Healthcare
Will conversation history help reasoning? We found that when models mess up once, they often get stuck. Surprisingly, a simple “try again” fixes this — and boosts reasoning.🧵 Project Page: unary-feedback.github.io
🚀Introducing Hierarchical Reasoning Model🧠🤖 Inspired by brain's hierarchical processing, HRM delivers unprecedented reasoning power on complex tasks like ARC-AGI and expert-level Sudoku using just 1k examples, no pretraining or CoT! Unlock next AI breakthrough with…
GLIMPSE 👁️ | What Do LVLMs Really See in Videos? A new benchmark for video understanding: 3,269 videos and 4,342 vision-centric questions across 11 spatiotemporal reasoning tasks. Test your model to see if it truly thinks with video—or is merely performing frame scanning.
ChatGPT can now do work for you using its own computer. Introducing ChatGPT agent—a unified agentic system combining Operator’s action-taking remote browser, deep research’s web synthesis, and ChatGPT’s conversational strengths.
Becoming an RL diehard in the past year and thinking about RL for most of my waking hours inadvertently taught me an important lesson about how to live my own life. One of the big concepts in RL is that you always want to be “on-policy”: instead of mimicking other people’s…
AGENT KB Leveraging Cross-Domain Experience for Agentic Problem Solving
🤔 How can AI Agents learn from past experiences across different tasks, instead of starting from scratch? 🚀 We just released Agent KB! Our framework uses a teacher-student mechanism to let agents reuse experience, enabling cross-task knowledge transfer. 🔥 The results are…
🔥Excited to share our latest work: Agent KB It achieves new open-source SOTA on the GAIA benchmark! We enable agents to learn from each other's experiences across tasks through hierarchical experience sharing. Paper📜: huggingface.co/papers/2507.06… Code🧑💻: github.com/OPPO-PersonalA…
🚀Introducing GTA1 – our new GUI Agent that leads the OSWorld leaderboard with a 45.2% success rate, outperforming OpenAI's CUA! GTA1 improves two core components of GUI agents: Planning and Grounding. 🧠 Planning: A generic test-time scaling strategy that concurrently samples…
Check out our latest paper on Memory📝 + Deep Research + Agent! 🔥New SOTA on GAIA!
🔥Excited to share our latest work: Agent KB It achieves new open-source SOTA on the GAIA benchmark! We enable agents to learn from each other's experiences across tasks through hierarchical experience sharing. Paper📜: huggingface.co/papers/2507.06… Code🧑💻: github.com/OPPO-PersonalA…
🧠 How can AI evolve from statically 𝘵𝘩𝘪𝘯𝘬𝘪𝘯𝘨 𝘢𝘣𝘰𝘶𝘵 𝘪𝘮𝘢𝘨𝘦𝘴 → dynamically 𝘵𝘩𝘪𝘯𝘬𝘪𝘯𝘨 𝘸𝘪𝘵𝘩 𝘪𝘮𝘢𝘨𝘦𝘴 as cognitive workspaces, similar to the human mental sketchpad? 🔍 What’s the 𝗿𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗿𝗼𝗮𝗱𝗺𝗮𝗽 from tool-use → programmatic…
Check out our new paper on thinking with images! A new promising direction for multimodal reasoning!🤩
Excited to share our new survey on the reasoning paradigm shift from "Think with Text" to "Think with Image"! 🧠🖼️ Our work offers a roadmap for more powerful & aligned AI. 🚀 📜 Paper: arxiv.org/pdf/2506.23918 ⭐ GitHub (400+🌟): github.com/zhaochen0110/A…
Glad to be part of this project! Being an AI scientist means there’s still tons to learn and do, haha! 🤖💪
Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.