Nathan Godey (@nthngdy)

Pinned

N

🚀 New Paper Alert! 🚀 We introduce Q-Filters, a training-free method for efficient KV Cache compression! It is compatible with FlashAttention and can compress along generation which is particularly useful for reasoning models ⚡ ⬇️R1-Distill-Llama-8B with 128 KV pairs ⬇️ 🧵

4

37

186

130

14.0K

Nathan Godey Retweeted

A

Alessio Devoto@devoto_alessio · Jul 21

🏆 Our @nvidia KV Cache Compression Leaderboard is now live! Compare state-of-the-art compression methods side-by-side with KVPress. See which techniques are leading in efficiency and performance. 🥇 huggingface.co/spaces/nvidia/…

8

40

254

103

17.0K

N

Nathan Godey@nthngdy · Jun 30

We produced FineWeb-Edu style annotations for biomedical data and showed that it helps for continued pre-training and lets us target domains to improve on! Work led by the amazing @riantouchent and supervised by @DeVillemonte 🌟 Check out the thread and paper below 👇🏼

rrian@riantouchent · Jun 30

Excited to introduce 𝗕𝗶𝗼𝗺𝗲𝗱-𝗘𝗻𝗿𝗶𝗰𝗵𝗲𝗱 🎉, a new annotated biomedical dataset designed to tackle the scarcity of clinical data for NLP research! 133M paragraphs from PMC-OA annotated for type, domain, and educational quality and publicly available on @huggingface👇🧵

0

1

8

1

392

Nathan Godey Retweeted

r

rian@riantouchent · Jun 30

Excited to introduce 𝗕𝗶𝗼𝗺𝗲𝗱-𝗘𝗻𝗿𝗶𝗰𝗵𝗲𝗱 🎉, a new annotated biomedical dataset designed to tackle the scarcity of clinical data for NLP research! 133M paragraphs from PMC-OA annotated for type, domain, and educational quality and publicly available on @huggingface👇🧵

1

2

3

1

561

Nathan Godey Retweeted

W

Wissam Antoun@wissam_antoun · Apr 14

ModernBERT or DeBERTaV3? What's driving performance: architecture or data? To find out we pretrained ModernBERT on the same dataset as CamemBERTaV2 (a DeBERTaV3 model) to isolate architecture effects. Here are our findings:

4

17

79

52

7.0K

N

Nathan Godey@nthngdy · Mar 25

I'm looking for 2 emergency reviewers for ACL 2025 in the Language Modeling and Efficient methods for NLP tracks Please reach out in my DMs if you are interested and can do a review within 24 hours 😬

0

1

5

0

684

Nathan Godey Retweeted

S

Simone Scardapane@s_scardapane · Mar 6

*Q-Filters: Leveraging QK Geometry for KV Cache Compression* by @nthngdy @devoto_alessio @yuzhaouoe @PMinervini @bensagot We find directions in the KV cache geometry allowing us to compress the cache significantly with little degradation in performance. arxiv.org/abs/2503.02812

2

19

81

29

3.0K

Nathan Godey Retweeted

W

Wenhao Zhu@Wenhao_NLP · Mar 3

🎉 Excited to share “Generalizing from Short to Long: Effective Data Synthesis for Long-Context Instruction Tuning” 📄 (arxiv.org/pdf/2502.15592) We propose "context synthesis": instead of generating instructions from long texts, we synthesize contexts for instructions—drawing…

1

23

76

34

7.0K

N

Nathan Godey@nthngdy · Mar 5

We find a single biased direction encodes a KV Cache selection mechanism in Self-Attention -- Key vector with a strong component in this direction results in this Key-Value pair being ignored by Query🚀🚀🚀

NNathan Godey@nthngdy · Mar 5

🚀 New Paper Alert! 🚀 We introduce Q-Filters, a training-free method for efficient KV Cache compression! It is compatible with FlashAttention and can compress along generation which is particularly useful for reasoning models ⚡ ⬇️R1-Distill-Llama-8B with 128 KV pairs ⬇️ 🧵

0

11

24

7

2.0K

Nathan Godey Retweeted

�

𝚐𝔪𝟾𝚡𝚡𝟾@gm8xx8 · Mar 5

Q-Filters: Leveraging QK Geometry for Efficient KV Cache Compression

2

23

131

55

10.0K