Ali Hassani
@AliHassaniJr
Computer Science PhD Student at Georgia Tech. I like fast computers. I like AI. I love the combination.
developer.nvidia.com/blog/cutlass-p… marks the start of a short series of blogposts about CUTLASS 3.x and CuTe that we've been meaning to write for years. There are a few more parts to come still, hope you enjoy!
NATTEN 0.21 ships Hopper and Blackwell FNA backward kernels, enabling much faster training on those architectures. Accelerate your training workloads with NATTEN today! github.com/SHI-Labs/NATTE…
We are releasing a major NATTEN upgrade that brings you new Hopper & Blackwell sparse attention kernels, both capable of realizing Theoretical Max Speedup: 90% sparsity -> 10X speedup. Thanks to the great efforts by @AliHassaniJr & @NVIDIA cutlass team! natten.org
Over 4 years into our journey bridging Convolutions and Transformers, we introduce Generalized Neighborhood Attention—Multi-dimensional Sparse Attention at the Speed of Light: github.com/SHI-Labs/NATTEN A collaboration with the best minds in AI and HPC. 🐝🟩🟧 @gtcomputing @nvidia
NATTEN 0.20.0 brings your our Hopper and Blackwell FNA kernels, Strided NA, improved user experience, profiling toolkit, and more! Oh, and we have new docs: natten.org. Run your sparse local attention at the Speed of Light today!
Oh Hopper, of course we didn't forget you...
Wondering what's happening with NATTEN in 2025? Check out Generalized Neighborhood Attention! Spoiler: NATTEN gets a new stride parameter, we made a simulator for all your analytical studies, AND a Blackwell kernel! Keep reading for more... (1 / 5)
Tomorrow (5/17), The CuTe creator, Cris Cecka, will teach CuTe himself on GPU Mode. youtube.com/watch?v=ufa4pm…
🚨🔥 CUTLASS 4.0 is released 🔥🚨 pip install nvidia-cutlass-dsl 4.0 marks a major shift for CUTLASS: towards native GPU programming in Python slidehelloworld.png docs.nvidia.com/cutlass/media/…