soham

@SohamGovande

@OpenAI 🌲 stanford cs | prev @nvidia @hazyresearch

Austin, TX

Joined February 2021

774Following

3KFollowers

Pinned

soham@SohamGovande · Apr 21

introducing chipmunk—a training-free algorithm making ai video generation 3.7x & image gen 1.6x faster! ⚡️ our kernels for column-sparse attention are 9.3x faster than FlashAttention-3 and column-sparse GEMM is 2.5x faster vs. cuBLAS a thread on the GPU kernel optimizations 🧵

TTogether AI@togethercompute · Apr 21

Our latest joint work w/ SandyResearch @ UCSD: training-free acceleration of Diffusion Transformers w/ dynamic sparsity, led by @austinsilveria @SohamGovande! ⚡️ 3.7x faster video and 1.6x faster image generation while preserving quality! 🧵 Open-source code & CUDA kernels!

191

35.0K

soham Retweeted

Ethan He@EthanHe_42 · Jul 18

After 2 years at @nvidia, I’m writing to share that I’ll start a new adventure. Working with brilliant teammates on cutting‑edge AI has shaped me so much: - Cosmos debuted as a SOTA world model and earned 8 k⭐️ on GitHub. - We open‑sourced the first recipe for upcycling 100 B+…

1.0K

191

137.0K

soham@SohamGovande · Jul 17

Two papers at the workshop I’m a bit fond of… @austinsilveria and @SohamGovande are going to be presenting Chipmunk - come chat with them about how they made video diffusion 3.7x faster! (With custom column-sparse attention kernels) 3/

AAustin Silveria@austinsilveria · Apr 21

Training-free acceleration of Diffusion Transformers with dynamic sparsity and cross-step attention/MLP deltas--collaboration with @SohamGovande and @realDanFu! ⚡️ 3.7x faster video and 1.6x faster image generation while preserving quality! 🧵 Open-source code & CUDA kernels!

759

soham@SohamGovande · Jul 16

I've gotten great results just from asking "how would someone much better than me approach this?" It's helped me learn fast as a beginner, and at times go from good to world-class. Mimicry & simulation are superpowers we overlook in favor of learning "the right way." More below.

CCate Hall@catehall · Jan 7

"how would someone much much better than me approach this?" also annoyingly OP

586

457

43.0K

soham@SohamGovande · Jul 16

Thrilled to share that I’ve joined @reflection_ai! We’re building superintelligent autonomous systems by co-designing research and product. Today, we’re launching Asimov. As AI benchmarks saturate, evaluation will increasingly live inside real-world products that are…

MMisha Laskin@MishaLaskin · Jul 16

Engineers spend 70% of their time understanding code, not writing it. That’s why we built Asimov at @reflection_ai. The best-in-class code research agent, built for teams and organizations.

8.0K

soham Retweeted

Austin Silveria@austinsilveria · Jun 26

🐿️ chipmunk ship! flux kontext supported for up to 30% faster cute chipmunks!

2.0K

soham@SohamGovande · Jun 5

Some updates to Chipmunk! 🐿️ Chipmunk now supports Wan 2.1, with up to 2.67x speedup - completely training-free! The paper is up on arXiv - take a look to see more in-depth analysis of sparsity in video models. Only 5-25% of activations account for >90% of the output!

AAustin Silveria@austinsilveria · Jun 5

chipmunk is up on arxiv! across HunyuanVideo and Flux.1-dev, 5-25% of the intermediate activation values in attention and MLPs account for 70-90% of the change in activations across steps caching + sparsity speeds up generation by only recomputing fast changing activations

7.0K

soham@SohamGovande · Jun 5

super fun to work on this :)

AAustin Silveria@austinsilveria · Jun 5

845

soham Retweeted

Victoria Ren@Victoriazren · Jun 4

Abundance campus organizing mentioned in the WSJ :) I care about this movement because my generation grew up in an era of dysfunctional, antagonistic, and reactive politics. Abundance focuses on outcomes, progress, and a vision for better days ahead. More coming soon!

396

42.0K

soham Retweeted

Benjamin F Spector@bfspector · May 27

(1/5) We’ve never enjoyed watching people chop Llamas into tiny pieces. So, we’re excited to be releasing our Low-Latency-Llama Megakernel! We run the whole forward pass in single kernel. Megakernels are faster & more humane. Here’s how to treat your Llamas ethically: (Joint…

143

875

522

362.0K

soham@SohamGovande · May 13

terrific work @Avanika15 & team! hybrid local and cloud LLM interactions are the future

AAvanika Narayan@Avanika15 · May 13

can you chat privately with a cloud llm—*without* sacrificing speed? excited to release minions secure chat: an open-source protocol for end-to-end encrypted llm chat with <1% latency overhead (even @ 30B+ params!). cloud providers can’t peek—messages decrypt only inside a…

750