Aviv Bick

@avivbick

CS PhD student at Carnegie Mellon

Joined January 2024

16Following

291Followers

Pinned

Aviv Bick@avivbick · Apr 30

The Transformer–SSM retrieval gap is driven by just a few heads! SSMs lag on tasks like MMLU (multiple-choice) and GSM8K (math) due to in-context retrieval challenges. But here’s the twist: just a handful of heads handle retrieval in both architectures. What we found 👇 1/

avivbick's tweet image. The Transformer–SSM retrieval gap is driven by just a few heads!

SSMs lag on tasks like MMLU (multiple-choice) and GSM8K (math) due to in-context retrieval challenges.
But here’s the twist: just a handful of heads handle retrieval in both architectures.
What we found 👇 1/

205

106

37.0K

Aviv Bick@avivbick · 22 h

Recording: youtube.com/watch?v=aNgg6M…

SSonglin Yang@SonglinYang4 · Jul 22

Happening now!

10.0K

Aviv Bick Retweeted

Songlin Yang@SonglinYang4 · Jul 22

Happening now!

14.0K

Aviv Bick@avivbick · Jul 19

I'll be giving the first H-Net talk this afternoon at 4:30-5 PT at the ES-FoMo workshop! come support the fight against Big Token 🙏

EES-FoMo@ICML2025@ESFoMo · Jul 18

Looking forward to seeing everyone for ES-FoMo part three tomorrow! We'll be in East Exhibition Hall A (the big one), and we've got an exciting schedule of invited talks, orals, and posters planned for you tomorrow. Let's meet some of our great speakers! 1/

134

12.0K

Aviv Bick Retweeted

Sukjun (June) Hwang@sukjun_hwang · Jul 11

Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data

654

4.0K

685.0K

Aviv Bick Retweeted

Albert Gu@_albertgu · Jul 8

I converted one of my favorite talks I've given over the past year into a blog post. "On the Tradeoffs of SSMs and Transformers" (or: tokens are bullshit) In a few days, we'll release what I believe is the next major advance for architectures.

113

782

540

113.0K

Aviv Bick Retweeted

Ricardo Buitrago@rbuit_ · Jul 7

Despite theoretically handling long contexts, existing recurrent models still fall short: they may fail to generalize past the training length. We show a simple and general fix which enables length generalization in up to 256k sequences, with no need to change the architectures!

194

118

39.0K

Aviv Bick Retweeted

Han Guo@HanGuo97 · Jun 6

We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between? Introducing Log-Linear Attention with: - Log-linear time training - Log-time inference (in both time and memory) - Hardware-efficient Triton kernels

194

1.0K

833

241.0K

Aviv Bick Retweeted

Antonio Orvieto@orvieto_antonio · Jun 3

We have a new SSM theory paper, just accepted to COLT, revisiting recall properties of linear RNNs. It's surprising how much one can delve into, and how beautiful it can become. With (and only thanks to) the amazing Alexandre and @BachFrancis arxiv.org/pdf/2502.09287

171

10.0K

Aviv Bick Retweeted

Assaf Ben Kish@abk_tau · May 13

New work! 🚨 Recurrent LLMs like Mamba and RWKV can efficiently process millions of tokens, yet still underperform on real-world long-context tasks. What's holding them back? 🤔 And how can a lightweight fix boost their performance by 35% on LongBench? 👇🏼🧵 Github:…

19.0K

Aviv Bick Retweeted

Yutong (Kelly) He@electronickale · Apr 28

✨ Love 4o-style image generation but prefer to use Midjourney? Tired of manual prompt crafting from inspo images? PRISM to the rescue! 🖼️→📝→🖼️ We automate black-box prompt engineering—no training, no embeddings, just accurate, readable prompts from your inspo images! 1/🧵

20.0K

Aviv Bick Retweeted

Kevin Li@kevinyli_ · Apr 22

At #ICLR2025 to present two recent works on reasoning distillation and efficient VLM inference with my wonderful collaborators! Excited to discuss efficient deep learning🚀, methods and architectures, and reasoning for LLMs🧠; DMs open! 👇Summary of the two works below! 1/3

1.0K

Aviv Bick Retweeted

Asher Trockman@ashertrockman · Apr 18

Are you a frontier lab investing untold sums in training? Are you trying to stay competitive? Are you finding that your competitors' models are ... thinking a bit too much like yours? Then antidistillation.com might be for you! @sama @elonmusk

140

17.0K

Aviv Bick@avivbick · Apr 9

Scores 4.17% on ARC-AGI 2 on Kaggle! 🔗 Code provided in the Kaggle notebook: kaggle.com/code/iliao2345…

IIsaac Liao@LiaoIsaac91893 · Mar 4

Introducing *ARC‑AGI Without Pretraining* – ❌ No pretraining. ❌ No datasets. Just pure inference-time gradient descent on the target ARC-AGI puzzle itself, solving 20% of the evaluation set. 🧵 1/4

147

17.0K

Aviv Bick Retweeted

Krithik Ramesh@KrithikTweets · Mar 21

🧬 Meet Lyra, a new paradigm for accessible, powerful modeling of biological sequences. Lyra is a lightweight SSM achieving SOTA performance across DNA, RNA, and protein tasks—yet up to 120,000x smaller than foundation models (ESM, Evo). Bonus: you can train it on your Mac. read…

149

736

525

110.0K