Tong Chen

@tomchen0

PhD student @uwcse @uwnlp

Joined February 2023

481Following

541Followers

Pinned

Tong Chen@tomchen0 · May 13

LLMs naturally memorize some verbatim of pre-training data. We study whether post-training can be an effective way to mitigate unintentional reproduction of pre-training data. 🛠️ No changes to pre-training or decoding 🔥 Training models to latently distinguish between memorized…

tomchen0's tweet image. LLMs naturally memorize some verbatim of pre-training data. We study whether post-training can be an effective way to mitigate unintentional reproduction of pre-training data.
🛠️ No changes to pre-training or decoding
🔥 Training models to latently distinguish between memorized…

16.0K

Tong Chen Retweeted

Stella Li ➡️ CogSci2025@StellaLisy · Jul 22

WHY do you prefer something over another? Reward models treat preference as a black-box😶‍🌫️but human brains🧠decompose decisions into hidden attributes We built the first system to mirror how people really make decisions in our #COLM2025 paper🎨PrefPalette✨ Why it matters👉🏻🧵

368

260

40.0K

Tong Chen Retweeted

Akari Asai@AkariAsai · Jul 15

Some updates 🚨 I finished my Ph.D at @uwcse in June 2025! After a year at AI2 as a Research Scientist, I am joining CMU @LTIatCMU & @mldcmu (courtesy) as an Assistant Professor in Fall 2026. The journey, acknowledgments & recruiting in 🧵

114

1.0K

108

103.0K

Tong Chen Retweeted

Scott Geng@scottgeng00 · Jul 9

🤔 How do we train AI models that surpass their teachers? 🚨 In #COLM2025: ✨Delta learning ✨makes LLM post-training cheap and easy – with only weak data, we beat open 8B SOTA 🤯 The secret? Learn from the *differences* in weak data pairs! 📜 arxiv.org/abs/2507.06187 🧵 below

161

111

21.0K

Tong Chen@tomchen0 · Jul 9

Can data owners & LM developers collaborate to build a strong shared model while each retaining data control? Introducing FlexOlmo💪, a mixture-of-experts LM enabling: • Flexible training on your local data without sharing it • Flexible inference to opt in/out your data…

AAi2@allen_ai · Jul 9

Introducing FlexOlmo, a new paradigm for language model training that enables the co-development of AI through data collaboration. 🧵

269

52.0K

Tong Chen Retweeted

Xinxi Lyu@XinxiLyu · Jul 7

Reasoning benchmarks (e.g., MMLU Pro and GPQA) have seen little benefit from naive RAG. But can we flip this? 🔥Introducing CompactDS: ✅Web-scale coverage ✅Runs with just 100GB RAM ✅Matches search engines The simplest RAG pipeline can even compete with agentic…

16.0K

Tong Chen@tomchen0 · Jul 4

Worried about overfitting to IFEval? 🤔 Use ✨IFBench✨ our new, challenging instruction-following benchmark! Loved working w/ @valentina__py! Personal highlight: our multi-turn eval setting makes it possible to isolate constraint-following from the rest of the instruction 🔍

VValentina Pyatkin@valentina__py · Jul 3

💡Beyond math/code, instruction following with verifiable constraints is suitable to be learned with RLVR. But the set of constraints and verifier functions is limited and most models overfit on IFEval. We introduce IFBench to measure model generalization to unseen constraints.

10.0K

Tong Chen Retweeted

Valentina Pyatkin@valentina__py · Jul 3

352

182

46.0K

Tong Chen Retweeted

CLS@ChengleiSi · Jun 30

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

170

599

204

139.0K

Tong Chen Retweeted

Thao Nguyen@thao_nguyen26 · Jun 23

Web data, the “fossil fuel of AI”, is being exhausted. What’s next?🤔 We propose Recycling the Web to break the data wall of pretraining via grounded synthetic data. It is more effective than standard data filtering methods, even with multi-epoch repeats! arxiv.org/abs/2506.04689

220

125

32.0K

Tong Chen Retweeted

Hao Xu@xuhaoxh · Jun 17

Wanna 🔎 inside Internet-scale LLM training data w/o spending 💰💰💰? Introducing infini-gram mini, an exact-match search engine with 14x less storage req than the OG infini-gram 😎 We make 45.6 TB of text searchable. Read on to find our Web Interface, API, and more. (1/n) ⬇️

20.0K

Tong Chen Retweeted

Sarah Wiegreffe @ ICML@sarahwiegreffe · Jun 13

A bit late to announce, but I’m excited to share that I'll be starting as an assistant professor at the University of Maryland @umdcs this August. I'll be recruiting PhD students this upcoming cycle for fall 2026. (And if you're a UMD grad student, sign up for my fall seminar!)

608

41.0K

Tong Chen Retweeted

Jacqueline He@jcqln_h · Jun 10

LMs often output answers that sound right but aren’t supported by input context. This is intrinsic hallucination: the generation of plausible, but unsupported content. We propose Precise Information Control (PIC): a task requiring LMs to ground only on given verifiable claims.

8.0K

Tong Chen Retweeted

Yike Wang@yikewang_ · Jun 9

LLMs are helpful for scientific research — but will they continuously be helpful? Introducing 🔍ScienceMeter: current knowledge update methods enable 86% preservation of prior scientific knowledge, 72% acquisition of new, and 38%+ projection of future (arxiv.org/abs/2505.24302).

241

128

23.0K

Tong Chen Retweeted

Cohere Labs@Cohere_Labs · Jun 4

Next week on Wednesday, June 11th we're excited to welcome @StellaLisy for a session on "Spurious Rewards: Rethinking Training Signals in RLVR." Thanks to @AhmadMustafaAn1 for organizing this session! 🔥 Learn more: cohere.com/events/Cohere-…

24.0K

Tong Chen Retweeted

Sahil Verma@Sahil1V · Jun 2

🚨 New Paper! 🚨 Guard models slow, language-specific, and modality-limited? Meet OmniGuard that detects harmful prompts across multiple languages & modalities all using one approach with SOTA performance in all 3 modalities!! while being 120X faster 🚀 arxiv.org/abs/2505.23856

8.0K

Tong Chen Retweeted

Yizhong Wang@yizhongwyz · May 30

Thrilled to announce that I will be joining @UTAustin @UTCompSci as an assistant professor in fall 2026! I will continue working on language models, data challenges, learning paradigms, & AI for innovation. Looking forward to teaming up with new students & colleagues! 🤠🤘

101

669

73.0K

Tong Chen Retweeted

Stella Li ➡️ CogSci2025@StellaLisy · May 27

🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…

344

2.0K

1.0K

681.0K

Tong Chen Retweeted

Siting Li@SitingLi627 · May 22

Excited to share that our paper "Exploring How Generative MLLMs Perceive More Than CLIP with the Same Vision Encoder" is accepted to #ACL2025! Preprint: arxiv.org/pdf/2411.05195 Thank @SimonShaoleiDu and @PangWeiKoh so much for your support and guidance throughout the journey!

9.0K

Tong Chen@tomchen0 · May 19

Curious about what affects the scaling behaviors of foundation models in neuroscience? Check out @lpjiang97 work below

LLinxing Preston Jiang@lpjiang97 · May 16

I'm excited to share our latest work — "Data Heterogeneity Limits the Scaling Effect of Pretraining in Neural Data Transformers", where we examined the effect of scaling up pretraining data in neural foundation models carefully.🧐 (1/9) Preprint: biorxiv.org/content/10.110…

7.0K

Tong Chen@tomchen0 · May 18

Accepted by #ACL2025! Congrats @mingdachen and the team🥳 Several cool ideas: - Maintain an explicit editable working memory during generation; - Actively integrate external feedback (factual check w/ VeriScore); A smart LM learns to memorize, a smarter LM learns to forget too!

AAran Komatsuzaki@arankomatsuzaki · Dec 25

Meta presents Improving Factuality with Explicit Working Memory Presents EWE, a novel approach that enhances factuality in long-form text generation by integrating a working memory that receives real-time feedback from external resources EWE outperforms strong baselines on four…

107

11.0K