Jordan Juravsky (@jordanjuravsky)

Pinned

J

Jordan Juravsky@jordanjuravsky · Jun 5

Happy Throughput Thursday! We’re excited to release Tokasaurus: an LLM inference engine designed from the ground up for high-throughput workloads with large and small models. (Joint work with @achakravarthy01, @ryansehrlich, @EyubogluSabri, @brad19brown, @jshetaye,…

jordanjuravsky's tweet image. Happy Throughput Thursday! We’re excited to release Tokasaurus: an LLM inference engine designed from the ground up for high-throughput workloads with large and small models.

(Joint work with @achakravarthy01, @ryansehrlich, @EyubogluSabri, @brad19brown, @jshetaye,…

7

47

203

76

41.0K

Pinned

Jordan Juravsky Retweeted

L

Laker Newhouse@LakerNewhouse · Jul 19

[1/9] We created a performant Lipschitz transformer by spectrally regulating the weights—without using activation stability tricks: no layer norm, QK norm, or logit softcapping. We think this may address a “root cause” of unstable training.

13

77

562

546

131.0K

Jordan Juravsky Retweeted

J

Jacky Kwok@jackyk02 · Jul 9

✨ Test-Time Scaling for Robotics ✨ Excited to release 🤖 RoboMonkey, which characterizes test-time scaling laws for Vision-Language-Action (VLA) models and introduces a framework that significantly improves the generalization and robustness of VLAs! 🧵(1 / N) 🌐 Website:…

2

13

35

8

45.0K

Jordan Juravsky Retweeted

J

Jerry Liu@jerrywliu · Jul 7

1/10 ML can solve PDEs – but precision🔬is still a challenge. Towards high-precision methods for scientific problems, we introduce BWLer 🎳, a new architecture for physics-informed learning achieving (near-)machine-precision (up to 10⁻¹² RMSE) on benchmark PDEs. 🧵How it works:

13

122

644

548

80.0K

Jordan Juravsky Retweeted

J

Jon Saad-Falcon@JonSaadFalcon · Jun 24

How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? 🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning…

11

62

216

163

49.0K

J

Jordan Juravsky@jordanjuravsky · Jun 16

Cartridges, powered by Tokasaurus! 🤝⚡️🦖

GGeoffrey Angus@GeoffreyAngus · Jun 16

Struggling with context management? Wish you could just stick it all in your model? We’ve integrated Cartridges, a new method of leveraging sleep-time compute for learning long contexts, into Tokasaurus, an inference engine optimized for high-throughput 🧵

0

3

11

0

981

Jordan Juravsky Retweeted

R

Rylan Schaeffer@RylanSchaeffer · Jun 13

A bit late to the party, but our paper on predictable inference-time / test-time scaling was accepted to #icml2025 🎉🎉🎉 TLDR: Best of N was shown to exhibit power (polynomial) law scaling (left), but maths suggest one should expect exponential scaling (center). We show how to…

9

21

116

63

17.0K

Jordan Juravsky Retweeted

H

Hermann@KumbongHermann · Jun 9

Excited to be presenting our new work–HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation– at #CVPR2025 this week. VAR (Visual Autoregressive Modelling) introduced a very nice way to formulate autoregressive image generation as a next-scale prediction task (from…

1

22

54

17

16.0K

Jordan Juravsky Retweeted

S

Sabri Eyuboglu@EyubogluSabri · Jun 9

When we put lots of text (eg a code repo) into LLM context, cost soars b/c of the KV cache’s size. What if we trained a smaller KV cache for our documents offline? Using a test-time training recipe we call self-study, we find that this can reduce cache memory on avg 39x…

12

72

297

216

63.0K

J

Jordan Juravsky@jordanjuravsky · Jun 5

In the test time scaling era, we all would love a higher throughput serving engine! Introducing Tokasaurus, a LLM inference engine for high-throughput workloads with large and small models! Led by @jordanjuravsky, in collaboration with @HazyResearch and an amazing team!

JJordan Juravsky@jordanjuravsky · Jun 5

Happy Throughput Thursday! We’re excited to release Tokasaurus: an LLM inference engine designed from the ground up for high-throughput workloads with large and small models. (Joint work with @achakravarthy01, @ryansehrlich, @EyubogluSabri, @brad19brown, @jshetaye,…

2

24

141

48

17.0K

Jordan Juravsky Retweeted

S

Simon Guo@simonguozirui · Jun 5

In fact, the test-time experiments and synthetic data generations for KernelBench were only possible with tokasaurus!

0

1

3

1

526

J

Jordan Juravsky@jordanjuravsky · Jun 5

I LOVE 🫶 using Tokasaurus 🦖🔥 for my research over the last few months! @jordanjuravsky and team have made it so easy to use and super high throughput across a variety of models and hardware configurations, making these test-time / throughput-heavy experiments even possible…

JJordan Juravsky@jordanjuravsky · Jun 5

Happy Throughput Thursday! We’re excited to release Tokasaurus: an LLM inference engine designed from the ground up for high-throughput workloads with large and small models. (Joint work with @achakravarthy01, @ryansehrlich, @EyubogluSabri, @brad19brown, @jshetaye,…

1

14

2

2.0K

J

Jordan Juravsky@jordanjuravsky · Jun 3

Local LLMs *privately* collaborating with smarter cloud LLMs, as if you never left your laptop. Pure joy to work with @ollama.

oollama@ollama · Jun 3

3 months ago, Stanford's Hazy Research lab introduced Minions, a project that connects Ollama to frontier cloud models to reduce cloud costs by 5-30x while achieving 98% of frontier model accuracy. Secure Minion turns an H100 into a secure enclave, where all memory and…

1

9

33

1

2.0K