Pengfei Liu

@stefan_fee

Associate Prof. at SJTU, leading GAIR Lab (http://plms.ai) Co-founder of Inspired Cognition, Postdoc at @LTIatCMU, Previously FNLP, @MILAMontreal,

Pittsburgh

Joined September 2014

777Following

4KFollowers

Pinned

Pengfei Liu@stefan_fee · Jul 9, 2024

The Alpaca moment of Large Multimodal Models! Can we build native LMMs just like Llama for simple multimodal generation? Introducing Anole: the first open-source, autoregressive native LMM for multimodal generation. Building on Chameleon by @AIatMeta: github.com/GAIR-NLP/anole

stefan_fee's tweet image. The Alpaca moment of Large Multimodal Models! Can we build native LMMs just like Llama for simple multimodal generation?
Introducing Anole: the first open-source, autoregressive native LMM for multimodal generation. Building on Chameleon by @AIatMeta: github.com/GAIR-NLP/anole

105

527

394

329.0K

Pengfei Liu@stefan_fee · Jul 8

RepoST was accepted to @COLM_conf !!! See you in Montreal 🚀 #COLM2025

YYiqing Xie@YiqingXieNLP · Mar 13

How to construct repo-level coding environments in a scalable way? Checkout RepoST: an automated framework to construct repo-level environments using Sandbox Testing (repost-code-gen.github.io) Models trained with RepoST data can generalize well to other datasets (e.g., RepoEval)

1.0K

Pengfei Liu@stefan_fee · Jul 8

FacTool has been accepted to COLM 2025 - two years after its arXiv debut! While the landscape of LLMs has changed a lot since then, tool-augmented LLMs and RAG are still among the most effective and practical approaches for detecting / mitigating hallucinations (ref:…

EEthan Chern@ethanchern · Jul 27, 2023

In the era of 🤖#GenerativeAI, text of all forms can be generated by LLMs. How can we identify and rectify *factual errors* in the generated output? We introduce FacTool, a framework for factuality detection in Generative AI. Website: ethanc111.github.io/factool_websit… (1/n)

2.0K

Pengfei Liu Retweeted

Beens@dirctd_by_beens · Jul 5

blog - abinesh-mathivanan.vercel.app/en/posts/short… read 'octothinker' last week and it's so cool. great work by @SinclairWang1 @FaZhou_998 @stefan_fee

995

Pengfei Liu@stefan_fee · Jun 30

Tech history: Every time humanity hits a tech wall, we just wait for someone named Ilya to show up and save the world :) - Neural nets stuck? - Language models plateau? - ... (skip tons of stuff) - ... - Superintelligence coming?

JJason Wei@_jasonwei · Jun 30

We don’t have AI self-improves yet, and when we do it will be a game-changer. With more wisdom now compared to the GPT-4 days, it's obvious that it will not be a “fast takeoff”, but rather extremely gradual across many years, probably a decade. The first thing to know is that…

1.0K

Pengfei Liu@stefan_fee · Jun 27

What foundation models do we REALLY need for the RL era? And what pre-training data? Excited to share our work: OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling arxiv.org/pdf/2506.20512 ✨ Key breakthroughs: - First RL-focused mid-training approach - Llama…

ZZengzhi Wang@SinclairWang1 · Jun 26

What Makes a Base Language Model Suitable for RL? Rumors in the community say RL (i.e., RLVR) on LLMs is full of “mysteries”: (1) Is the magic only happening on Qwen + Math? (2) Does the "aha moment" only spark during math reasoning? (3) Is evaluation hiding some tricky traps?…

8.0K

Pengfei Liu@stefan_fee · Jun 18

nice discussion

FFan Zhou@FaZhou_998 · Jun 18

🧵Interesting paper—great to see the emphasis on large token counts, which is always appreciated. 😅But some of the results are... puzzling. For example, Table 3 essentially suggests that MegaMath is a non-math corpus. This is weird, especially given the care we've taken during…

727

Pengfei Liu@stefan_fee · May 30

The real breakthrough isn't better AI—it's breaking free from nature's constraints We're witnessing a paradigm shift from "passive adaptation" to "active construction" in AI training. 🌊 The old way: AI learns from whatever data naturally exists • Constrained by existing…

AAdina Yakup@AdinaYakup · May 30

📑Interesting paper by GAIR community Thinking with Generated Images🔥 enables a single large multimodal model to generate and reason with visual thoughts, greatly improving its ability to tackle complex vision and multimodal tasks. huggingface.co/papers/2505.22……

989

Pengfei Liu@stefan_fee · May 22

312 quality trajectories + open-source model beats Claude 3.7 Sonnet (thinking) in computer use 🚀 We answer the following important questions in our recent tech report: github.com/GAIR-NLP/PC-Ag… 1. Can open-source models + small high-quality datasets outperform top closed-source…

YYanheng He@YanhengHe · May 22

🔥 Excited to share our work "Efficient Agent Training for Computer Use" Q: Do computer use agents need massive data or complex RL to excel? A: No, with just 312 high-quality trajectories, Qwen2.5-VL can outperform Claude 3.7, setting a new SOTA for Windows computer use. 1/6

4.0K

Pengfei Liu@stefan_fee · May 22

📣 New Discovery on Computer Use Agent With just 312 high-quality trajectories + open-source model, we've surpassed Claude 3.7 Sonnet (thinking) in computer use capabilities 🚀 ⚡️ In the new era of AI Agent training, many key questions remain: • Can open-source models + small…

JJiahe Jin@jiahe_Jin0123 · May 22

Excited to share PC Agent-E, our new work on efficient agent training for computer use! Trained with only❗️312 human trajectories enhanced by Claude 3.7 Sonnet, PC Agent-E achieves a 🤯 141% relative improvement, even surpasses Claude 3.7 Sonnet (thinking)!

3.0K

Pengfei Liu@stefan_fee · Apr 26

This is for you @AIatMeta

TTeortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex · Apr 25

Llama team must read OctoThinker notion report asap if they want to make reasoner models that aren't DoA before LlamaCon. There's still time. With their GPU largesse they can do it.

874

Pengfei Liu@stefan_fee · Apr 24

We are sharing this progress report at booth 260 poster in Hall3 of the IClR venue now.

ZZengzhi Wang@SinclairWang1 · Apr 24

🚨New blog alert! Working on LLM x RL? You don’t want to miss this. Most SOTA RL results today rely on Qwen2.5 base models, but swap in Llama at the same model size and RL training dynamics shift drastically—RL from base often fails. Why? We ran a series of carefully controlled…

1.0K

Pengfei Liu Retweeted

Haoyang Zou@alanyoung8848 · Apr 1

🔥 Introducing ToRL: Scaling Tool-Integrated RL directly from base models! LLMs discover optimal reasoning+tool strategies with no presets. ToRL-7B hits 43.3% on AIME24, +14% over no-tool RL, +17% over Qwen-TIR. 📝: arxiv.org/abs/2503.23383… 💻: github.com/GAIR-NLP/ToRL 1/9

1.0K