Geoffrey Angus (@GeoffreyAngus)

Pinned

Geoffrey Angus Retweeted

T

Together AI@togethercompute · Jul 2

Announcing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. Built in…

9

79

502

330

260.0K

Geoffrey Angus Retweeted

w

will brown@willccbb · Jul 17

cant stop thinking about this one insanely elegant, seems insanely powerful

26

54

848

957

97.0K

Geoffrey Angus Retweeted

P

Pierce Freeman@piercefreeman · Jun 23

Text diffusion models might be the most unintuitive architecture around Like: let's start randomly filling in words in a paragraph and iterate enough times to get something sensible But now that google's gemini diffusion is near sota, I think we need to take them seriously

2

3

4

587

G

Geoffrey Angus@GeoffreyAngus · Jun 23

The Nvidia Tensor Core is the most important evolution of computer architecture in the last decade We explain why / how it's evolved Shout out to collaborators @bfspector @tri_dao @colfaxintl @charles_irl @ia_buck Neil Movva Jonah Alben esp @simonguozirui for the cutest cover pic

SSemiAnalysis@SemiAnalysis_ · Jun 23

NVIDIA Tensor Core Evolution From Volta To Blackwell Amdahl’s Law, Strong Scaling Asynchronous Execution Blackwell, Hopper, Ampere, Turing, Volta semianalysis.com/2025/06/23/nvi…

8

24

327

135

49.0K

G

Geoffrey Angus@GeoffreyAngus · Jun 16

This looks super cool. Our own research team was exploring similar ideas for building an internal corpus of context for our content generation tasks. Now we just got a huge head start on it!

SSabri Eyuboglu@EyubogluSabri · Jun 9

When we put lots of text (eg a code repo) into LLM context, cost soars b/c of the KV cache’s size. What if we trained a smaller KV cache for our documents offline? Using a test-time training recipe we call self-study, we find that this can reduce cache memory on avg 39x…

16

3

20

0

1.0K

G

Geoffrey Angus@GeoffreyAngus · Jun 16

Excited to introduce #CollabLLM -- a method to train LLMs to collaborate better w/ humans! Selected as #icml2025 oral (top 1%)🏅 New multi-turn training objective + user simulator👇

SShirley Wu@ShirleyYXWu · Jun 16

Even the smartest LLMs can fail at basic multiturn communication Ask for grocery help → without asking where you live 🤦‍♀️ Ask to write articles → assumes your preferences 🤷🏻‍♀️ ⭐️CollabLLM (top 1%; oral @icmlconf) transforms LLMs from passive responders into active collaborators.…

6

9

51

26

7.0K

G

Geoffrey Angus@GeoffreyAngus · Jun 16

Cartridges, powered by Tokasaurus! 🤝⚡️🦖

GGeoffrey Angus@GeoffreyAngus · Jun 16

Struggling with context management? Wish you could just stick it all in your model? We’ve integrated Cartridges, a new method of leveraging sleep-time compute for learning long contexts, into Tokasaurus, an inference engine optimized for high-throughput 🧵

0

3

11

0

983

G

Geoffrey Angus@GeoffreyAngus · Jun 16

An advantage of training a cache/prefix (as opposed to a lora adapter), is that we can serve per-user cartridges using the same optimizations and kernels, which inference engines already use for per-user kv caches. @GeoffreyAngus just integrated cartridges into Tokasaurus (a…

GGeoffrey Angus@GeoffreyAngus · Jun 16

Struggling with context management? Wish you could just stick it all in your model? We’ve integrated Cartridges, a new method of leveraging sleep-time compute for learning long contexts, into Tokasaurus, an inference engine optimized for high-throughput 🧵

1

5

17

4

2.0K

G

Geoffrey Angus@GeoffreyAngus · Jun 12

what is happening

0

83

G

Geoffrey Angus@GeoffreyAngus · Jun 12

wandb pls

1

2

8

0

655

G

Geoffrey Angus@GeoffreyAngus · Jun 11

.@togethercompute API has the fastest DeepSeek v3 endpoint (2x faster than next best API endpoint) and almost 5x faster than DeepSeek API. See how to use it directly with @cline to make all your Cline workflows snappier!

TTogether AI@togethercompute · Jun 10

You can now use @cline with DeepSeek V3 on Together! Here's a guide on exactly how to do that in 3 steps.

1

2

7

0

5.0K

Geoffrey Angus Retweeted

S

Sabri Eyuboglu@EyubogluSabri · Jun 9

When we put lots of text (eg a code repo) into LLM context, cost soars b/c of the KV cache’s size. What if we trained a smaller KV cache for our documents offline? Using a test-time training recipe we call self-study, we find that this can reduce cache memory on avg 39x…

12

72

298

216

63.0K

Geoffrey Angus Retweeted

J

Jordan Juravsky@jordanjuravsky · Jun 5

Happy Throughput Thursday! We’re excited to release Tokasaurus: an LLM inference engine designed from the ground up for high-throughput workloads with large and small models. (Joint work with @achakravarthy01, @ryansehrlich, @EyubogluSabri, @brad19brown, @jshetaye,…

7

47

203

76

41.0K