Simran Arora (@simran_s_arora)

Pinned

S

Simran Arora@simran_s_arora · Oct 29

Wish writing AI kernels was like writing PyTorch??? Enter ThunderKittens 0.002: for simpler, faster, more adorable AI kernels! We use TK to provide 10-40% faster attention backwards, CuBLAS-speed GEMMs, 8x faster state space models, 14x faster linear attentions – averaging <200…

simran_s_arora's tweet image. Wish writing AI kernels was like writing PyTorch??? Enter ThunderKittens 0.002: for simpler, faster, more adorable AI kernels! We use TK to provide 10-40% faster attention backwards, CuBLAS-speed GEMMs, 8x faster state space models, 14x faster linear attentions – averaging &lt;200…

9

40

268

145

42.0K

Pinned

Simran Arora Retweeted

C

Cartesia@cartesia_ai · Jun 12

👑 We’re #1! Sonic-2 leads @Labelbox’s Speech Generation Leaderboard topping out in speech quality, word error rate, and naturalness. Build your real-time voice apps with the 🥇 best voice AI model. ➡️ labelbox.com/leaderboards/s…

0

8

32

2

3.0K

S

Simran Arora@simran_s_arora · Jul 18

Join us at ES-FoMo tomorrow!! It's a great lineup!

EES-FoMo@ICML2025@ESFoMo · Jul 18

Looking forward to seeing everyone for ES-FoMo part three tomorrow! We'll be in East Exhibition Hall A (the big one), and we've got an exciting schedule of invited talks, orals, and posters planned for you tomorrow. Let's meet some of our great speakers! 1/

0

2

44

6

2.0K

S

Simran Arora@simran_s_arora · Jul 17

Cartridges could be this "missing learning paradigm" Karpathy talks about 1) agent does tasks, collects memories that help it do better via ICL 2) memories are trained / compacted into Cartridges 3) Cartridges shared / composed / RAG-ed between other agents

wwill brown@willccbb · Jul 17

cant stop thinking about this one insanely elegant, seems insanely powerful

1

5

22

12

2.0K

S

Simran Arora@simran_s_arora · Jul 18

thanks @willccbb!! checkout Cartridges at ICML ES-FoMo this week :) excited for what's next

wwill brown@willccbb · Jul 17

cant stop thinking about this one insanely elegant, seems insanely powerful

1

13

107

35

10.0K

S

Simran Arora@simran_s_arora · Jul 17

On Saturday we’re hosting the ES-FoMo workshop, with @tri_dao, @dan_biderman, @simran_s_arora, @m_ryabinin and others - we’ve got a great slate of papers and invited talks, come join us! (More on the great slate of speakers soon) x.com/esfomo/status/… 2/

EES-FoMo@ICML2025@ESFoMo · May 19

ES-FoMo is back for round three at #ICML2025! Join us in Vancouver on Saturday July 19 for a day dedicated to Efficient Systems for Foundation Models: from 💬reasoning models to🖼️scalable multimodality, 🧱efficient architectures, and more! Submissions due May 26! More below 👇

1

3

13

1

4.0K

Simran Arora Retweeted

S

Sukjun (June) Hwang@sukjun_hwang · Jul 11

Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data

91

702

5.0K

4.0K

699.0K

Simran Arora Retweeted

J

Jerry Liu@jerrywliu · Jul 7

1/10 ML can solve PDEs – but precision🔬is still a challenge. Towards high-precision methods for scientific problems, we introduce BWLer 🎳, a new architecture for physics-informed learning achieving (near-)machine-precision (up to 10⁻¹² RMSE) on benchmark PDEs. 🧵How it works:

13

124

645

546

81.0K

Simran Arora Retweeted

S

Sanjana Srivastava@sanjana__z · Jun 25

🤖 Household robots are becoming physically viable. But interacting with people in the home requires handling unseen, unconstrained, dynamic preferences, not just a complex physical domain. We introduce ROSETTA: a method to generate reward for such preferences cheaply. 🧵⬇️

4

30

131

71

29.0K

Simran Arora Retweeted

J

Jon Saad-Falcon@JonSaadFalcon · Jun 24

How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? 🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning…

11

66

217

163

49.0K

Simran Arora Retweeted

S

Siddharth Karamcheti@siddkaramcheti · Jun 18

Thrilled to share that I'll be starting as an Assistant Professor at Georgia Tech (@ICatGT / @GTrobotics / @mlatgt) in Fall 2026. My lab will tackle problems in robot learning, multimodal ML, and interaction. I'm recruiting PhD students this next cycle – please apply/reach out!

68

29

547

56

44.0K

Simran Arora Retweeted

G

Geoffrey Angus@GeoffreyAngus · Jun 16

Struggling with context management? Wish you could just stick it all in your model? We’ve integrated Cartridges, a new method of leveraging sleep-time compute for learning long contexts, into Tokasaurus, an inference engine optimized for high-throughput 🧵

1

11

44

11

6.0K

Simran Arora Retweeted

I

Infini-AI-Lab@InfiniAILab · Jun 16

🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: multiverse4fm.github.io 🧵 1/n

6

87

220

108

77.0K

Simran Arora Retweeted

J

Jeremy Howard@jeremyphoward · Jun 12

Claude not able to continue my research chat about context compression papers because it ran out of context because it doesn't use context compression.

32

27

627

96

32.0K

S

Simran Arora@simran_s_arora · Jun 12

Looks like a very slick way to tune and cheaply serve custom models! If I were building on this, I’d try to find a better way to initialize the cache. You can initialize LoRA as a no-op and let backprop handle the rest, but KV-tuning methods need weird initialization hacks.

SSabri Eyuboglu@EyubogluSabri · Jun 9

When we put lots of text (eg a code repo) into LLM context, cost soars b/c of the KV cache’s size. What if we trained a smaller KV cache for our documents offline? Using a test-time training recipe we call self-study, we find that this can reduce cache memory on avg 39x…

1

2

17

8

2.0K

S

Simran Arora@simran_s_arora · Jun 10

Today we shipped a new real time API for streaming speech to text (a new family of models called Ink), that’s extremely fast, cheap and designed specifically for voice agents. We’re cooking hard, lots more releases coming soon 🧑‍🍳

CCartesia@cartesia_ai · Jun 10

Building voice agents? Meet Ink-Whisper: the fastest, most affordable streaming speech-to-text model. 🌎 Optimized for accuracy in real-world conditions 👯 Pair with our Sonic text-to-speech → fastest duo in voice AI 🔌 Plugs into @Vapi_AI,@pipecat_ai, @livekit Read more:…

4

13

90

25

16.0K

S

Simran Arora@simran_s_arora · Jun 10

I like this idea very much and have long advocated for something like this. Synthetically enriched «KV prefix» is a natural augment to modern long context models.

�𝚐𝔪𝟾𝚡𝚡𝟾@gm8xx8 · Jun 9

Cartridges: Storing long contexts in tiny caches with self-study - train-once, reusable memory via SELF-STUDY - 38.6× less memory, 26.4× higher throughput - extends context to 484k, composes across corpora - outperforms LoRA, DuoAttention, and standard ICL BLOG:…

3

17

171

107

13.0K

S

Simran Arora@simran_s_arora · Jun 10

Trading online compute for offline compute is an under-discussed axis of scaling, but one that will be increasingly relevant going forward.

SSabri Eyuboglu@EyubogluSabri · Jun 9

When we put lots of text (eg a code repo) into LLM context, cost soars b/c of the KV cache’s size. What if we trained a smaller KV cache for our documents offline? Using a test-time training recipe we call self-study, we find that this can reduce cache memory on avg 39x…

1

2

20

7

2.0K

S

Simran Arora@simran_s_arora · Jun 10

Cartridges = an interesting offline alternative to regular ICL for frequently used large text corpora. 👇 A lot to learn in this awesome work imo. (another one from a Hazy Research team) Bravo to the team 👏

SSimran Arora@simran_s_arora · Jun 9

There’s been tons of work on KV-cache compression and KV-cache free Transformer-alternatives (SSMs, linear attention) models for long-context, but we know there’s no free lunch with these methods. The quality-memory tradeoffs are annoying. *Is all lost?* Introducing CARTRIDGES:…

1

2

12

1

1.0K

S

Simran Arora@simran_s_arora · Jun 10

more evidence that kv caches have a lot of room for compression.

SSimran Arora@simran_s_arora · Jun 9

There’s been tons of work on KV-cache compression and KV-cache free Transformer-alternatives (SSMs, linear attention) models for long-context, but we know there’s no free lunch with these methods. The quality-memory tradeoffs are annoying. *Is all lost?* Introducing CARTRIDGES:…

1

2

15

2

2.0K

Simran Arora Retweeted

�

𝚐𝔪𝟾𝚡𝚡𝟾@gm8xx8 · Jun 9

Cartridges: Storing long contexts in tiny caches with self-study - train-once, reusable memory via SELF-STUDY - 38.6× less memory, 26.4× higher throughput - extends context to 484k, composes across corpora - outperforms LoRA, DuoAttention, and standard ICL BLOG:…

4

30

190

107

22.0K