Jeffrey Cheng

@jeff_cheng_77

incoming phd @PrincetonPLI | prev masters @jhuclsp

Baltimore, MD

Joined March 2024

113Following

257Followers

Pinned

Jeffrey Cheng@jeff_cheng_77 · Dec 18

‼️Tired of dealing with long reasoning chains?‼️ Introducing Compressed Chain of Thought (CCoT), a framework to perform efficient reasoning through a few dense representations in place of long sequences of discrete tokens. 📜: arxiv.org/abs/2412.13171

jeff_cheng_77's tweet image. ‼️Tired of dealing with long reasoning chains?‼️

Introducing Compressed Chain of Thought (CCoT), a framework to perform efficient reasoning through a few dense representations in place of long sequences of discrete tokens.

📜: arxiv.org/abs/2412.13171

190

136

25.0K

Jeffrey Cheng Retweeted

Guilherme Penedo@gui_penedo · Jun 27

We have finally released the 📝paper for 🥂FineWeb2, our large multilingual pre-training dataset. Along with general (and exhaustive) multilingual work, we introduce a concept that can also improve English performance: deduplication-based upsampling, which we call rehydration.

427

237

74.0K

Jeffrey Cheng@jeff_cheng_77 · Jun 23

Check out our work on fair comparison among KV cache reduction methods and PruLong, one of the most effective, easy-to-use memory reduction method for long-context LMs!

AAdithya Bhaskar@AdithyaNLP · Jun 23

There are many KV cache-reduction methods, but a fair comparison is challenging. We propose a new unified metric called “critical KV footprint”. We compare existing methods and propose a new one - PruLong, which “prunes” certain attn heads to only look at local tokens. 1/7

7.0K

Jeffrey Cheng Retweeted

hyunji amy lee@hyunji_amy_lee · Jun 19

🚨 Want models to better utilize and ground on the provided knowledge? We introduce Context-INformed Grounding Supervision (CINGS)! Training LLM with CINGS significantly boosts grounding abilities in both text and vision-language models compared to standard instruction tuning.

127

12.0K

Jeffrey Cheng Retweeted

Mehrdad Farajtabar@MFarajtabar · Jun 5

🧵 1/8 The Illusion of Thinking: Are reasoning models like o1/o3, DeepSeek-R1, and Claude 3.7 Sonnet really "thinking"? 🤔 Or are they just throwing more compute towards pattern matching? The new Large Reasoning Models (LRMs) show promising gains on math and coding benchmarks,…

112

584

3.0K

4.0K

874.0K

Jeffrey Cheng Retweeted

Abe Hou@abe_hou · May 26

I am excited to share that I will join @StanfordAILab for my PhD in Computer Science in Fall 2025. Immense gratitude to my mentors: @ben_vandurme @DanielKhashabi @TianxingH @jackjingyuzhang @orionweller @tsvetshop Lauren Gardner @du_hongru @StellaLisy @hiaoxui 🧵:

196

14.0K

Jeffrey Cheng@jeff_cheng_77 · May 23

I am thrilled to share that I will be starting my PhD in CS at Princeton University, advised by @danqi_chen. Many thanks to all those who have supported me on this journey: my family, friends, and my wonderful mentors @ben_vandurme, @ruyimarone, and @orionweller at @jhuclsp.

155

14.0K

Jeffrey Cheng Retweeted

William Fleshman@willcfleshman · Apr 7

🚨 Our latest paper is now on ArXiv! 👻 (w/ @ben_vandurme) SpectR: Dynamically Composing LM Experts with Spectral Routing (1/4) 🧵

3.0K

Jeffrey Cheng Retweeted

Alexander Martin@alexdmartin314 · Apr 2

Wish you could get a Wikipedia style article for unfolding events? Introducing WikiVideo: a new multimodal task and benchmark for Wikipedia-style article generation from multiple videos!

3.0K

Jeffrey Cheng Retweeted

Abe Hou@abe_hou · Mar 17

👁️Recent works use LLMs for social simulations—but can these agents help shape effective policies? 💥Our new paper tackles a bold question many have wondered about: Can generative agent societies simulate to inform public health policy? 🔗: arxiv.org/abs/2503.09639

11.0K

Jeffrey Cheng Retweeted

Benjamin Van Durme@ben_vandurme · Mar 17

Our latest on compressed representations: Key-Value Distillation (KVD). Query-independen transformer compression, with offline supervised distillation.

134

13.0K

Jeffrey Cheng Retweeted

Niloofar (✈️ ACL)@niloofar_mire · Mar 3

Adding or removing PII in LLM training can *unlock previously unextractable* info. Even if “John.Mccarthy” never reappears, enough Johns & Mccarthys during post-training can make it extractable later! New paper on PII memorization & n-gram overlaps: arxiv.org/abs/2502.15680

6.0K

Jeffrey Cheng Retweeted

Orion Weller@orionweller · Feb 26

Ever wonder how test-time compute would do in retrieval? 🤔 introducing ✨rank1✨ rank1 is distilled from R1 & designed for reranking. rank1 is state-of-the-art at complex reranking tasks in reasoning, instruction-following, and general semantics (often 2x RankLlama 🤯) 🧵

239

160

25.0K

Jeffrey Cheng Retweeted

Nishant is ill-prepared for ACL2025@NishantBalepur · Feb 21

🚨 New Position Paper 🚨 Multiple choice evals for LLMs are simple and popular, but we know they are awful 😬 We complain they're full of errors, saturated, and test nothing meaningful, so why do we still use them? 🫠 Here's why MCQA evals are broken, and how to fix them 🧵

190

21.0K

Jeffrey Cheng@jeff_cheng_77 · Feb 20

Additional reasoning from scaling test-time compute has dramatic impacts on a model's confidence in its answers! Find out more in our paper led by @williamjurayj.

AAdithya Bhaskar@AdithyaNLP · Jun 23

433

Jeffrey Cheng Retweeted

Jenna Russell @ACL@jennajrussell · Jan 28

People often claim they know when ChatGPT wrote something, but are they as accurate as they think? Turns out that while general population is unreliable, those who frequently use ChatGPT for writing tasks can spot even "humanized" AI-generated text with near-perfect accuracy 🎯

160

1.0K

403

170.0K