Bradley Brown (@brad19brown)

Pinned

B

Bradley Brown@brad19brown · Jan 28

My fellow code monkeys (@jordanjuravsky @ryansehrlich) and I are excited to release CodeMonkeys: a system for solving SWE-bench issues specifically designed to leverage test-time compute! CodeMonkeys solves 57.4% of issues on SWE-bench Verified. A core component of our system…

brad19brown's tweet image. My fellow code monkeys (@jordanjuravsky @ryansehrlich) and I are excited to release CodeMonkeys: a system for solving SWE-bench issues specifically designed to leverage test-time compute!

CodeMonkeys solves 57.4% of issues on SWE-bench Verified. A core component of our system…

6

34

130

56

53.0K

Bradley Brown Retweeted

A

Azalia Mirhoseini@Azaliamirh · Jul 16

Looking forward to attending ICML! Here are some works on memory/long context, verification, kernel design, multi-model AI systems, and theoretical understanding of test-time scaling from my awesome students and collaborators!

3

16

86

20

22.0K

Bradley Brown Retweeted

J

Jacky Kwok@jackyk02 · Jul 9

✨ Test-Time Scaling for Robotics ✨ Excited to release 🤖 RoboMonkey, which characterizes test-time scaling laws for Vision-Language-Action (VLA) models and introduces a framework that significantly improves the generalization and robustness of VLAs! 🧵(1 / N) 🌐 Website:…

2

12

35

8

45.0K

Bradley Brown Retweeted

J

Jerry Liu@jerrywliu · Jul 7

1/10 ML can solve PDEs – but precision🔬is still a challenge. Towards high-precision methods for scientific problems, we introduce BWLer 🎳, a new architecture for physics-informed learning achieving (near-)machine-precision (up to 10⁻¹² RMSE) on benchmark PDEs. 🧵How it works:

13

122

645

546

81.0K

Bradley Brown Retweeted

J

Jon Saad-Falcon@JonSaadFalcon · Jun 24

How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? 🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning…

11

62

217

163

49.0K

B

Bradley Brown@brad19brown · Jun 9

Giving LLMs very large amounts of context can be really useful, but it can also be slow and expensive. Could scaling inference time compute help? In our latest work, we show that allowing models to spend test time compute to “self-study” a large corpora can >20x decode…

SSabri Eyuboglu@EyubogluSabri · Jun 9

When we put lots of text (eg a code repo) into LLM context, cost soars b/c of the KV cache’s size. What if we trained a smaller KV cache for our documents offline? Using a test-time training recipe we call self-study, we find that this can reduce cache memory on avg 39x…

0

8

33

12

7.0K

Bradley Brown Retweeted

S

Sabri Eyuboglu@EyubogluSabri · Jun 9

When we put lots of text (eg a code repo) into LLM context, cost soars b/c of the KV cache’s size. What if we trained a smaller KV cache for our documents offline? Using a test-time training recipe we call self-study, we find that this can reduce cache memory on avg 39x…

12

72

298

216

63.0K

B

Bradley Brown@brad19brown · Jun 5

Happy Throughput Thursday to those who celebrate!

JJordan Juravsky@jordanjuravsky · Jun 5

Happy Throughput Thursday! We’re excited to release Tokasaurus: an LLM inference engine designed from the ground up for high-throughput workloads with large and small models. (Joint work with @achakravarthy01, @ryansehrlich, @EyubogluSabri, @brad19brown, @jshetaye,…

0

1

7

1

445

B

Bradley Brown@brad19brown · May 27

We wrote a megakernel! Excited to share how we fused Llama-1B into a single kernel to reach SOTA latency. Check out our blog post and code below!

BBenjamin F Spector@bfspector · May 27

(1/5) We’ve never enjoyed watching people chop Llamas into tiny pieces. So, we’re excited to be releasing our Low-Latency-Llama Megakernel! We run the whole forward pass in single kernel. Megakernels are faster & more humane. Here’s how to treat your Llamas ethically: (Joint…

3

7

63

7

8.0K

Bradley Brown Retweeted

B

Benjamin F Spector@bfspector · May 27

(1/5) We’ve never enjoyed watching people chop Llamas into tiny pieces. So, we’re excited to be releasing our Low-Latency-Llama Megakernel! We run the whole forward pass in single kernel. Megakernels are faster & more humane. Here’s how to treat your Llamas ethically: (Joint…

33

143

875

522

362.0K

Bradley Brown Retweeted

A

Azalia Mirhoseini@Azaliamirh · Apr 29

Excited to release SWiRL: A synthetic data generation and multi-step RL approach for reasoning and tool use! With SWiRL, the model’s capability generalizes to new tasks and tools. For example, a model trained to use a retrieval tool to solve multi-hop knowledge-intensive…

5

75

396

300

62.0K

Bradley Brown Retweeted

A

Anna Goldie@annadgoldie · Apr 29

Excited to share our new paper on Step-Wise Reinforcement Learning (SWiRL), which uses reinforcement learning and synthetic trajectories to improve multi-step reasoning and tool use! (1/8)

5

21

134

99

25.0K

B

Bradley Brown@brad19brown · Apr 7

In Large Language Monkeys, we showed the scaling laws of inference-time compute with repeated sampling--the power law relationship between the number of repeated attempts and the fraction of problems solved! The following amazing work theoretically proves the necessary and…

RRylan Schaeffer@RylanSchaeffer · Apr 4

Interested in test time / inference scaling laws? Then check out our newest preprint!! 📉 How Do Large Language Monkeys Get Their Power (Laws)? 📉 arxiv.org/abs/2502.17578 w/ @JoshuaK92829 @sanmikoyejo @Azaliamirh @jplhughes @jordanjuravsky @sprice354_ @aengus_lynch1…

0

34

171

130

42.0K

B

Bradley Brown@brad19brown · Apr 5

When studying repeated sampling in Large Language Monkeys, we found that the relationship between log(pass@k) and the number of samples often follows a power law. But *why* do we see this scaling law? At first glance, this is surprising, since for a single problem pass@k and k…

RRylan Schaeffer@RylanSchaeffer · Apr 4

Interested in test time / inference scaling laws? Then check out our newest preprint!! 📉 How Do Large Language Monkeys Get Their Power (Laws)? 📉 arxiv.org/abs/2502.17578 w/ @JoshuaK92829 @sanmikoyejo @Azaliamirh @jplhughes @jordanjuravsky @sprice354_ @aengus_lynch1…

0

6

21

11

4.0K

Bradley Brown Retweeted

h

hazyresearch@HazyResearch · Mar 29

The Great American AI Race. I wrote something about how we need a holistic AI effort from academia, industry, and the US government to have the best shot at a freer, better educated, and healthier world in AI. I’m a mega bull on the US and open source AI. Maybe we’re cooking…

1

82

15

27.0K

Bradley Brown Retweeted

B

Benjamin F Spector@bfspector · Mar 15

(1/6) Joyously announcing ThunderKittens with real support on NVIDIA Blackwell! We've released BF16/FP8 GEMM and attention fwd+bwd kernels, up to 2x faster than cuBLAS GEMMs on H100. Blog: bit.ly/41tuT4Q With @realDanFu, @AaryanSinghal4, and @hazyresearch!

4

29

191

62

19.0K

Bradley Brown Retweeted

B

Benjamin F Spector@bfspector · Mar 5

(1/7) Inspired by DeepSeek's FlashMLA, we're releasing ThunderMLA—a fused megakernel optimized for variable-prompt decoding! ⚡️🐱ThunderMLA is up to 35% faster than FlashMLA and just 400 LoC. Blog: bit.ly/4kubAAK With @AaryanSinghal4, @realDanFu, and @hazyresearch!

7

70

372

151

57.0K

Bradley Brown Retweeted

S

Simon Guo@simonguozirui · Feb 25

LLMs for GPU kernel🌽generation have been getting Pop🍿ular since our preview last Dec; excited to announce 📢 our full paper 📃 for KernelBench! Turns out KernelBench is quite challenging 🧠 — frontier models outperform the PyTorch Eager baseline <20% of the time. More 🧵👇

9

71

306

134

106.0K

B

Bradley Brown@brad19brown · Feb 25

we shipp’d 👭 on-device lms and frontier cloud lms. and…they were a match☺️. 98% accuracy, just 17.5% the cloud API costs beyond excited to drop minions: where local lms meet cloud lms 😊 joint work w/@EyubogluSabri & @dan_biderman at @hazyresearch. ty @togethercompute,…

DDan Biderman@dan_biderman · Feb 25

How can we use small LLMs to shift more AI workloads onto our laptops and phones? In our paper and open-source code, we pair on-device LLMs (@ollama) with frontier LLMs in the cloud (@openai, @together), to solve token-intensive workloads on your 💻 at 17.5% of the cloud cost…

6

43

82

14

27.0K

B

Bradley Brown@brad19brown · Feb 25

All these on-device models are coming out (e.g. llama 3.2). But how can we actually make them useful for hard reasoning workloads (beyond iMessage summarization)? Our idea: give the on-device models your long context and let them communicate with frontier models in the cloud.

DDan Biderman@dan_biderman · Feb 25

How can we use small LLMs to shift more AI workloads onto our laptops and phones? In our paper and open-source code, we pair on-device LLMs (@ollama) with frontier LLMs in the cloud (@openai, @together), to solve token-intensive workloads on your 💻 at 17.5% of the cloud cost…

0

10

32

5

5.0K

Bradley Brown Retweeted

D

Dan Biderman@dan_biderman · Feb 25

How can we use small LLMs to shift more AI workloads onto our laptops and phones? In our paper and open-source code, we pair on-device LLMs (@ollama) with frontier LLMs in the cloud (@openai, @together), to solve token-intensive workloads on your 💻 at 17.5% of the cloud cost…

36

172

635

504

180.0K