Ali Behrouz

@behrouz_ali

Research Intern @Google, Ph.D. Student @Cornell_CS, interested in machine learning and understanding intelligence.

Joined January 2023

1KFollowing

4KFollowers

Pinned

Ali Behrouz@behrouz_ali · Jan 13

Attention has been the key component for most advances in LLMs, but it can’t scale to long context. Does this mean we need to find an alternative? Presenting Titans: a new architecture with attention and a meta in-context memory that learns how to memorize at test time. Titans…

behrouz_ali's tweet image. Attention has been the key component for most advances in LLMs, but it can’t scale to long context. Does this mean we need to find an alternative?

Presenting Titans: a new architecture with attention and a meta in-context memory that learns how to memorize at test time. Titans…

605

3.0K

628.0K

Pinned

Ali Behrouz@behrouz_ali · Jun 20

Fast, Numerically Stable, and Auto-Differentiable Spectral Clipping via Newton-Schulz Iteration Hi all, I'm bacc. I have a lot to talk about, but let's start with this fun side-project. Here I'll talk about novel (?) ways to compute: 1. Spectral Clipping (discussed in Rohan's…

rrohan anil@_arohan_ · Jun 3

Doing some math to cleanse the timelinez Why do loss blow up? A question to deepthink. So an attempt: why not clip the singular values of the update? σ > 1, clip to 1 σ <=1, return σ Naive implementation: Update = U S V.T Update_clipped = U clip(S, 1) V.T How to make it…

288

238

45.0K

Ali Behrouz Retweeted

Vahab Mirrokni@mirrokni · Jul 21

Proud to announce an official Gold Medal at #IMO2025🥇 The IMO committee has certified the result from our general-purpose Gemini system—a landmark moment for our team and for the future of AI reasoning. deepmind.google/discover/blog/… (1/n) Highlights in thread:

326

42.0K

Ali Behrouz Retweeted

Reza Bayat@reza_byt · Jul 16

📄 New Paper Alert! ✨ 🚀Mixture of Recursions (MoR): Smaller models • Higher accuracy • Greater throughput Across 135 M–1.7 B params, MoR carves a new Pareto frontier: equal training FLOPs yet lower perplexity, higher few‑shot accuracy, and more than 2x throughput.…

237

146

22.0K

Ali Behrouz Retweeted

Ali Behrouz@behrouz_ali · May 30

What makes attention the critical component for most advances in LLMs and what holds back long-term memory modules (RNNs)? Can we strictly generalize Transformers? Presenting Atlas (A powerful Titan): a new architecture with long-term in-context memory that learns how to…

143

930

986

106.0K

Ali Behrouz Retweeted

Yingheng Wang@yingheng_wang · Jun 23

❓ Are LLMs actually problem solvers or just good at regurgitating facts? 🚨New Benchmark Alert! We built HeuriGym to benchmark if LLMs can craft real heuristics for real-world hard combinatorial optimization problems. 🛞 We’re open-sourcing it all: ✅ 9 problems ✅ Iterative…

132

16.0K

Ali Behrouz@behrouz_ali · Jun 21

The scope of what counts as research has narrowed considerably.

JJason Wei@_jasonwei · Jun 21

My favorite thing an old OpenAI buddy of mine told me is, whenever he hears that someone is a “great AI researcher”, he just directly spends 5 minutes looking at that person‘s PRs and wandb runs. People can do all kinds of politics and optical shenanigans, but at the end of the…

284

38.0K

Ali Behrouz@behrouz_ali · Jun 17

Very interesting work!

IInfini-AI-Lab@InfiniAILab · Jun 16

🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: multiverse4fm.github.io 🧵 1/n

829

Ali Behrouz Retweeted

Pedro Domingos@pmddomingos · Jun 15

The ratio of science to engineering in AI is approaching zero.

219

17.0K

Ali Behrouz Retweeted

TuringPost@TheTuringPost · Jun 3

Last week, @Google dropped a paper on ATLAS, a new architecture that reimagines how models learn and use memory. Unfortunately, it flew under everyone’s radar - but it shouldn’t have! So what's Atlas bringing to the table? ▪️ Active memory via Google’s so-called Omega rule. It…

418

347

27.0K