Jiahao Qiu

@JiahaoQiu99

PhD @Princeton｜Prev. Undergrad @SJTU1896 @UMichCSE

Princeton, NJ

Joined October 2023

227Following

243Followers

Pinned

Jiahao Qiu@JiahaoQiu99 · May 27

The GAIA game is over, and Alita is the final answer. Alita takes the top spot in GAIA, outperforming OpenAI Deep Research and Manus. Many general-purpose agents rely heavily on large-scale, manually predefined tools and workflows. However, we believe that for general AI…

JiahaoQiu99's tweet image. The GAIA game is over, and Alita is the final answer.

Alita takes the top spot in GAIA, outperforming OpenAI Deep Research and Manus.

Many general-purpose agents rely heavily on large-scale, manually predefined tools and workflows. However, we believe that for general AI…

22.0K

Jiahao Qiu Retweeted

Jason Wei@_jasonwei · Jun 30

We don’t have AI self-improves yet, and when we do it will be a game-changer. With more wisdom now compared to the GPT-4 days, it's obvious that it will not be a “fast takeoff”, but rather extremely gradual across many years, probably a decade. The first thing to know is that…

165

1.0K

694

366.0K

Jiahao Qiu Retweeted

Souradip Chakraborty@SOURADIPCHAKR18 · Feb 16

🥍🥍Excited to share that "Collab: Controlled Decoding Using Mixture of Agents for AI Alignment" has been accepted at #ICLR2025 Q. How to provably combine multiple #expert #LLMs for a target task at #inferencetime ?? 💥 Collab More Details coming soon...

14.0K

Jiahao Qiu Retweeted

AK@_akhaliq · Jun 24

ReasonFlux-PRM Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs

168

33.0K

Jiahao Qiu Retweeted

Ling Yang@LingYang_PU · Jun 24

ReasonFlux-PRM-1.5B/7B: New trajectory-aware PRMs that evaluate how LLMs reason — not just what they output. ✅ Better data selection ✅ Stronger RL policy guidance ✅ Improved test-time scaling Paper: arxiv.org/abs/2506.18896 Code and Model: github.com/Gen-Verse/Reas…

4.0K

Jiahao Qiu@JiahaoQiu99 · Jun 17

Research with amazing collaborators @JizeJiang, @MeitangLi, and @JingchengYang, guided by great advisors and supported by the generous help of talented researchers @BowenJin13, @XingyuFu2, and many open-source contributors (easyr1, verl, vllm... etc).

JJize Jiang@JizeJiang · Jun 17

Excited to introduce VTool-R1! We’ve trained VLMs to “think visually” using RL, blending Python-based 🖼️visual edits with💡textual Chain-of-Thought reasoning. Our trained qwen2.5-VL-32B surpasses GPT-4o on ChartQA & TableVQA, and even the compact qwen2.5-VL-7B significantly…

4.0K

Jiahao Qiu Retweeted

Hongru Wang@HongruWang007 · Jun 3

What’s is the agent? What is the optimal behavior to achieve the predefined goal? And how to learn that behavior policy? We formally introduce a systematic Theory of Agent (ToA), analogous to the cognitive framework of Theory of Mind (ToM). Where ToM refers to the ability to…

129

18.0K

Jiahao Qiu@JiahaoQiu99 · May 31

Agent Distillation vs LLM Distillation Alita proposes agent distillation, different from the traditional distillation paradigm, which is much cheaper and easier through auto MCP generation! Our experiments show great improvement on the GAIA validation through agent distillation.…

JiahaoQiu99's tweet image. Agent Distillation vs LLM Distillation
Alita proposes agent distillation, different from the traditional distillation paradigm, which is much cheaper and easier through auto MCP generation!
Our experiments show great improvement on the GAIA validation through agent distillation.…

2.0K

Jiahao Qiu@JiahaoQiu99 · May 31

🚀Shallow alignment is now an important problem in LLM alignment. Dr. Xinagyu Qi first proposed this problem in the field of safety alignment in Safety Alignment Should Be Made More Than Just a Few Tokens Deep. 🌟Our newest research systematically and comprehensively validates…

JiahaoQiu99's tweet image. 🚀Shallow alignment is now an important problem in LLM alignment. Dr. Xinagyu Qi first proposed this problem in the field of safety alignment in Safety Alignment Should Be Made More Than Just a Few Tokens Deep.

🌟Our newest research systematically and comprehensively validates…

429

Jiahao Qiu Retweeted

Zhanpeng Zhou@zhanpeng_zhou · May 30

📢📢We are organizing a workshop at #NeurIPS 2025 on the Emergent Trust Risks in Large Reasoning Model. We are inviting members to join our Program Committee. If you are interested in any topics related to LLM safety, we welcome your participation🤩🤩! forms.gle/wjoVyW1Hq1M9Sg…

4.0K

Jiahao Qiu@JiahaoQiu99 · May 30

It is interesting to find that the second day after I posted the tweet introducing Alita, the GAIA leaderboard validation was removed. GAIA Leaderboard: huggingface.co/spaces/gaia-be… RIP🕯️🕯️🕯️

JiahaoQiu99's tweet image. It is interesting to find that the second day after I posted the tweet introducing Alita, the GAIA leaderboard validation was removed.

GAIA Leaderboard: huggingface.co/spaces/gaia-be…

RIP🕯️🕯️🕯️

608

Jiahao Qiu@JiahaoQiu99 · May 30

I heard that someone has refined their agent product using Alita’s paradigm in one day and achieved great performance. That is super cool. For more details in Alita: github.com/CharlesQ9/Alita Paper Link: arxiv.org/pdf/2505.20286

248

Jiahao Qiu@JiahaoQiu99 · May 29

I just updated more discussions in github.com/CharlesQ9/Alita.

JJiahao Qiu@JiahaoQiu99 · May 27

285

Jiahao Qiu Retweeted

Jiayi Geng@JiayiiGeng · May 27

Using LLMs to build AI scientists is all the rage now (e.g., Google’s AI co-scientist [1] and Sakana’s Fully Automated Scientist [2]), but how much do we understand about their core scientific abilities? We know how LLMs can be vastly useful (solving complex math problems) yet…

492

443

69.0K