Ali Hassani

@AliHassaniJr

Computer Science PhD Student at Georgia Tech. I like fast computers. I like AI. I love the combination.

Atlanta, GA

Joined November 2017

57Following

436Followers

Ali Hassani Retweeted

Vijay@__tensorcore__ · Jul 20

developer.nvidia.com/blog/cutlass-p… marks the start of a short series of blogposts about CUTLASS 3.x and CuTe that we've been meaning to write for years. There are a few more parts to come still, hope you enjoy!

299

227

40.0K

Ali Hassani@AliHassaniJr · Jul 14

NATTEN 0.21 ships Hopper and Blackwell FNA backward kernels, enabling much faster training on those architectures. Accelerate your training workloads with NATTEN today! github.com/SHI-Labs/NATTE…

$AliHassaniJr's tweet card. Torch 2.7.X CUDA 12.8 Installpip3 install natten==0.21.0+torch270cu128 \ -f https://whl.natten.org CUDA 12.6 Installpip3 in...$

586

Ali Hassani@AliHassaniJr · Jun 8

We are releasing a major NATTEN upgrade that brings you new Hopper & Blackwell sparse attention kernels, both capable of realizing Theoretical Max Speedup: 90% sparsity -> 10X speedup. Thanks to the great efforts by @AliHassaniJr & @NVIDIA cutlass team! natten.org

HHumphrey Shi@humphrey_shi · Apr 24

Over 4 years into our journey bridging Convolutions and Transformers, we introduce Generalized Neighborhood Attention—Multi-dimensional Sparse Attention at the Speed of Light: github.com/SHI-Labs/NATTEN A collaboration with the best minds in AI and HPC. 🐝🟩🟧 @gtcomputing @nvidia

3.0K

Ali Hassani@AliHassaniJr · Jun 7

NATTEN 0.20.0 brings your our Hopper and Blackwell FNA kernels, Strided NA, improved user experience, profiling toolkit, and more! Oh, and we have new docs: natten.org. Run your sparse local attention at the Speed of Light today!

632

Ali Hassani@AliHassaniJr · Jun 7

Oh Hopper, of course we didn't forget you...

AAli Hassani@AliHassaniJr · Apr 24

Wondering what's happening with NATTEN in 2025? Check out Generalized Neighborhood Attention! Spoiler: NATTEN gets a new stride parameter, we made a simulator for all your analytical studies, AND a Blackwell kernel! Keep reading for more... (1 / 5)

553

Ali Hassani Retweeted

Haicheng Wu@asdf1234_0 · May 16

Tomorrow (5/17), The CuTe creator, Cris Cecka, will teach CuTe himself on GPU Mode. youtube.com/watch?v=ufa4pm…

9.0K

Ali Hassani Retweeted

Vijay@__tensorcore__ · May 13

🚨🔥 CUTLASS 4.0 is released 🔥🚨 pip install nvidia-cutlass-dsl 4.0 marks a major shift for CUTLASS: towards native GPU programming in Python slidehelloworld.png docs.nvidia.com/cutlass/media/…

424

155

73.0K