Horace He

@cHHillee

@thinkymachines Formerly @PyTorch "My learning style is Horace twitter threads" - @typedfemale

chhillee

Joined February 2010

533Following

37KFollowers

Pinned

Horace He@cHHillee · Aug 7

For too long, users have lived under the software lottery tyranny of fused attention implementations. No longer. Introducing FlexAttention, a new PyTorch API allowing for many attention variants to enjoy fused kernels in a few lines of PyTorch. pytorch.org/blog/flexatten… 1/10

cHHillee's tweet image. For too long, users have lived under the software lottery tyranny of fused attention implementations.

No longer.

Introducing FlexAttention, a new PyTorch API allowing for many attention variants to enjoy fused kernels in a few lines of PyTorch.
pytorch.org/blog/flexatten…
1/10

269

1.0K

259.0K

Pinned

Horace He@cHHillee · May 3

When this word started popping up I initially smugly thought that people were misspelling "syncophant" only to realize that I'd entangled "sycophant" with "syncopation" in my head.

DDanielle Fong 🔆@DanielleFong · May 1

people using sycophant like they knew what it was

11.0K

Pinned

Horace He@cHHillee · Feb 20

Most normal FlexAttention mask. Also, thanks for the "Implementation-wise, although FlexAttention practically enabled the project..." comment - that's perhaps the #1 thing we were hoping for with FlexAttention :)

MMax Zhdanov@maxxxzdn · Feb 18

Depending on how dense you want your neighbourhood for local attention to be, the attention matrix is very sparse and FlexAttention can use it, given that the functional form of the mask turned out to be pretty simple. 10/N 🧵

9.0K

Horace He@cHHillee · Jul 21

Other than OpenAI, how many other AI efforts do you think will have gotten a gold medal at the IMO? Several other AI labs are vagueposting about their IMO results, but seem to abiding by IMO's request for a week's delay.

8.0K

Horace He@cHHillee · Jul 15

It's been an exciting 3 months at Thinky and so much has happened already! Imo we're building some of the best research infra around. Research infra is about jointly optimizing researcher *and* GPU efficiency, and it's been a joy to work on this with the other great folk here!

MMira Murati@miramurati · Jul 15

Thinking Machines Lab exists to empower humanity through advancing collaborative general intelligence. We're building multimodal AI that works with how you naturally interact with the world - through conversation, through sight, through the messy way we collaborate. We're…

433

42.0K

Horace He@cHHillee · May 15

I'll be at MLSys today! DM me if you want to chat about Pytorch, ML systems, or life at Thinking Machines!

10.0K

Horace He@cHHillee · May 13

The fundamental question here (computing MFU) is a very reasonable question to ask in an interview (and if I'd recommend learning it if you don't know how). However, the real interview question I would like to ask is this: "I see 3 assumptions in this question that range from…

wwh@nrehiew_ · May 12

Saw this on Reddit with half the comments shitting on it

283

437

66.0K

Horace He@cHHillee · Apr 17

This is pretty neat. They insert into torch.compile and insert some profile-guided optimizations as well as a bunch of other specific optimizations like offloading. Since torch.compile is all in Python all their compiler passes are fairly accessible too! github.com/deepspeedai/De…

DDeepSpeed@DeepSpeedAI · Apr 16

Introducing 🚀DeepCompile🚀: compiler-based distributed training optimizations. - Automatic parallelization & profile-guided optimizations - Enable ZeRO1, ZeRO3, Offloading, etc. via compiler passes - 1.2X-7X speedups over manual ZeRO1/ZeRO3/Offloading tinyurl.com/8cys28xk

221

101

21.0K

Horace He@cHHillee · Mar 12

I'll be here and talking about ML systems! There'll be some of the best GPU folk I know here, so come and learn more together about Blackwell GPUs!

SSemiAnalysis@SemiAnalysis_ · Mar 11

SemiAnalysis is hosting an Nvidia Blackwell GPU Hackathon on Sunday March 16th. It is the ultimate playground for Blackwell PTX tech enthusiasts, offering hands-on exploration of Blackwell & PTX infrastructure while collaborating on open-source projects.

226

65.0K

Horace He Retweeted

François Fleuret@francoisfleuret · Feb 3

It is hard to overstate how cool and powerful is flex attention. @cHHillee pytorch.org/blog/flexatten… TL;DR: it is an implementation of the attention operator in @pytorch that allows in particular to efficiently "carve" the attention matrix. 1/3

480

301

32.0K