Austin Silveria
@austinsilveria
communication bandwidth/latency @togethercompute research intern
Training-free acceleration of Diffusion Transformers with dynamic sparsity and cross-step attention/MLP deltas--collaboration with @SohamGovande and @realDanFu! ⚡️ 3.7x faster video and 1.6x faster image generation while preserving quality! 🧵 Open-source code & CUDA kernels!
Chipmunks can now hop across multiple GPU architectures (sm_80, sm_89, sm_90). You can get a 1.4-3x lossless speedup when generating videos on A100s, 4090s, and H100s! Chipmunks also play with more open-source models: Mochi, Wan, & others (w/ tutorials for integration) 🐿️
Some updates to Chipmunk! 🐿️ Chipmunk now supports Wan 2.1, with up to 2.67x speedup - completely training-free! The paper is up on arXiv - take a look to see more in-depth analysis of sparsity in video models. Only 5-25% of activations account for >90% of the output!
Some updates to Chipmunk! 🐿️ Chipmunk now supports Wan 2.1, with up to 2.67x speedup - completely training-free! The paper is up on arXiv - take a look to see more in-depth analysis of sparsity in video models. Only 5-25% of activations account for >90% of the output!
chipmunk is up on arxiv! across HunyuanVideo and Flux.1-dev, 5-25% of the intermediate activation values in attention and MLPs account for 70-90% of the change in activations across steps caching + sparsity speeds up generation by only recomputing fast changing activations
introducing chipmunk—a training-free algorithm making ai video generation 3.7x & image gen 1.6x faster! ⚡️ our kernels for column-sparse attention are 9.3x faster than FlashAttention-3 and column-sparse GEMM is 2.5x faster vs. cuBLAS a thread on the GPU kernel optimizations 🧵
Our latest joint work w/ SandyResearch @ UCSD: training-free acceleration of Diffusion Transformers w/ dynamic sparsity, led by @austinsilveria @SohamGovande! ⚡️ 3.7x faster video and 1.6x faster image generation while preserving quality! 🧵 Open-source code & CUDA kernels!
Super excited to share Chipmunk 🐿️- training-free acceleration of diffusion transformers (video, image generation) with dynamic attention & MLP sparsity! Led by @austinsilveria, @SohamGovande - 3.7x faster video gen, 1.6x faster image gen. Kernels written in TK ⚡️🐱 1/
Training-free acceleration of Diffusion Transformers with dynamic sparsity and cross-step attention/MLP deltas--collaboration with @SohamGovande and @realDanFu! ⚡️ 3.7x faster video and 1.6x faster image generation while preserving quality! 🧵 Open-source code & CUDA kernels!
Our latest joint work w/ SandyResearch @ UCSD: training-free acceleration of Diffusion Transformers w/ dynamic sparsity, led by @austinsilveria @SohamGovande! ⚡️ 3.7x faster video and 1.6x faster image generation while preserving quality! 🧵 Open-source code & CUDA kernels!