Sabri Eyuboglu (@EyubogluSabri)

Pinned

S

Sabri Eyuboglu@EyubogluSabri · Jun 9

When we put lots of text (eg a code repo) into LLM context, cost soars b/c of the KV cache’s size. What if we trained a smaller KV cache for our documents offline? Using a test-time training recipe we call self-study, we find that this can reduce cache memory on avg 39x…

EyubogluSabri's tweet image. When we put lots of text (eg a code repo) into LLM context, cost soars b/c of the KV cache’s size.

What if we trained a smaller KV cache for our documents offline? Using a test-time training recipe we call self-study, we find that this can reduce cache memory on avg 39x…

12

72

298

216

63.0K

S

Sabri Eyuboglu@EyubogluSabri · Jul 21

Check out Tokasaurus on Modal to make Llama-1B brrr! This repeated sampling example shows off two engine features that are important for serving small models: very low CPU overhead and automatic shared prefix exploitation with Hydragen.

CCharles 🎉 Frye@charles_irl · Jul 21

Tokasaurus, the "little LLM engine that could" by @jordanjuravsky and @EyubogluSabri of @HazyResearch/@ScalingIntelLab, is capable of some pretty impressive perf. We replicated their report of >80k tok/s for 16bit LLaMA 3.1 8B on Large Language Monkeys GSM8K - and you can too!

1

7

29

2

2.0K

Sabri Eyuboglu Retweeted

C

Charles 🎉 Frye@charles_irl · Jul 21

Tokasaurus, the "little LLM engine that could" by @jordanjuravsky and @EyubogluSabri of @HazyResearch/@ScalingIntelLab, is capable of some pretty impressive perf. We replicated their report of >80k tok/s for 16bit LLaMA 3.1 8B on Large Language Monkeys GSM8K - and you can too!

4

9

75

37

8.0K

Sabri Eyuboglu Retweeted

D

Dan Biderman@dan_biderman · Jul 19

Train the entries of your KV caches @EyubogluSabri

0

2

22

3

1.0K

S

Sabri Eyuboglu@EyubogluSabri · Jul 17

Thank you for the kind words -- we can't either! We're really excited about models learning new things and remembering their experience, and we think that Cartridges is a step towards that future. My co-author @EyubogluSabri will be giving a talk on Cartridges at @ESFoMo at…

wwill brown@willccbb · Jul 17

cant stop thinking about this one insanely elegant, seems insanely powerful

2

7

58

27

5.0K

S

Sabri Eyuboglu@EyubogluSabri · Jul 17

Cartridges could be this "missing learning paradigm" Karpathy talks about 1) agent does tasks, collects memories that help it do better via ICL 2) memories are trained / compacted into Cartridges 3) Cartridges shared / composed / RAG-ed between other agents

wwill brown@willccbb · Jul 17

cant stop thinking about this one insanely elegant, seems insanely powerful

1

5

22

12

2.0K

S

Sabri Eyuboglu@EyubogluSabri · Jul 18

found my weekend experiment

wwill brown@willccbb · Jul 17

cant stop thinking about this one insanely elegant, seems insanely powerful

0

2

13

2

721

Sabri Eyuboglu Retweeted

E

ES-FoMo@ICML2025@ESFoMo · Jul 18

Looking forward to seeing everyone for ES-FoMo part three tomorrow! We'll be in East Exhibition Hall A (the big one), and we've got an exciting schedule of invited talks, orals, and posters planned for you tomorrow. Let's meet some of our great speakers! 1/

3

21

76

25

42.0K

S

Sabri Eyuboglu@EyubogluSabri · Jul 18

ES-FoMo is back tomorrow! Come join is in East Exhibition Hall A bright and early at 8:30AM for a great slate of invited talks, orals, spotlight lightning talks, and 150 posters!

EES-FoMo@ICML2025@ESFoMo · Jul 18

Looking forward to seeing everyone for ES-FoMo part three tomorrow! We'll be in East Exhibition Hall A (the big one), and we've got an exciting schedule of invited talks, orals, and posters planned for you tomorrow. Let's meet some of our great speakers! 1/

0

2

14

0

1.0K

Sabri Eyuboglu Retweeted

w

will brown@willccbb · Jul 18

honestly i would much rather have an open-source 4.1-mini than an open-source o3-mini

20

6

228

11

16.0K

S

Sabri Eyuboglu@EyubogluSabri · Jul 18

Thanks @willccbb!! For those at ICML, I'm giving a talk on Cartridges at the ES-FoMo workshop on Saturday at 10:45 -- come through!! Excited to talk memory, test-time training, and continual learning!

wwill brown@willccbb · Jul 17

cant stop thinking about this one insanely elegant, seems insanely powerful

0

5

37

7

9.0K

Sabri Eyuboglu Retweeted

A

Agent B@MichelIvan92347 · Jul 18

Agree. And very interesting in prod for static/semi-static corpus in our tests...

0

1

0

253

S

Sabri Eyuboglu@EyubogluSabri · Jul 18

thanks @willccbb!! checkout Cartridges at ICML ES-FoMo this week :) excited for what's next

wwill brown@willccbb · Jul 17

cant stop thinking about this one insanely elegant, seems insanely powerful

1

12

107

35

10.0K

Sabri Eyuboglu Retweeted

w

will brown@willccbb · Jul 17

cant stop thinking about this one insanely elegant, seems insanely powerful

26

54

848

957

97.0K

S

Sabri Eyuboglu@EyubogluSabri · Jul 17

We’re presenting Minions at ICML starting now until 1:30pm at E-2907 — come by and chat!!

DDan Biderman@dan_biderman · Feb 25

How can we use small LLMs to shift more AI workloads onto our laptops and phones? In our paper and open-source code, we pair on-device LLMs (@ollama) with frontier LLMs in the cloud (@openai, @together), to solve token-intensive workloads on your 💻 at 17.5% of the cloud cost…

0

2

17

1

6.0K

Sabri Eyuboglu Retweeted

K

Khush Gupta@notkhushg · Jul 17

like this: hazyresearch.stanford.edu/blog/2025-06-0…?

0

3

8

6

472

S

Sabri Eyuboglu@EyubogluSabri · Jul 17

Minions poster 🥹 Thursday 11am Pacific East Exhibition Hall A-B E-2907

DDan Biderman@dan_biderman · Feb 25

How can we use small LLMs to shift more AI workloads onto our laptops and phones? In our paper and open-source code, we pair on-device LLMs (@ollama) with frontier LLMs in the cloud (@openai, @together), to solve token-intensive workloads on your 💻 at 17.5% of the cloud cost…

1

5

29

4

8.0K

Sabri Eyuboglu Retweeted

A

Azalia Mirhoseini@Azaliamirh · Jul 16

Looking forward to attending ICML! Here are some works on memory/long context, verification, kernel design, multi-model AI systems, and theoretical understanding of test-time scaling from my awesome students and collaborators!

3

16

86

20

22.0K

S

Sabri Eyuboglu@EyubogluSabri · Jul 16

At #ICML2025 in Vancouver 🇨🇦 this week, presenting some work from my first year at Stanford! Come find me at posters or just around the conference! Thursday: KernelBench: Can LLMs Write Efficient GPU Kernels? 11AM East E-2010 Saturday: Kevin: Multi-Turn RL for Generating…

AAzalia Mirhoseini@Azaliamirh · Jul 16

Looking forward to attending ICML! Here are some works on memory/long context, verification, kernel design, multi-model AI systems, and theoretical understanding of test-time scaling from my awesome students and collaborators!

0

13

56

6

15.0K

Sabri Eyuboglu Retweeted

N

Nicholas Roberts@nick11roberts · Jul 14

🎉 Excited to share that our paper "Pretrained Hybrids with MAD Skills" was accepted to @COLM_conf 2025! We introduce Manticore - a framework for automatically creating hybrid LMs from pretrained models without training from scratch. 🧵[1/n]

1

18

44

3

6.0K

Sabri Eyuboglu Retweeted

A

AI at AMD@AIatAMD · Jul 11

We’re thrilled to collaborate with the @HazyResearch @StanfordAILab, led by Chris Ré, to power Minions, their cutting-edge agentic framework tackling the cost-accuracy tradeoff in modern AI systems. This innovation is enabled on AMD Ryzen AI, thanks to seamless integration with…

2

16

80

4

5.0K