Gashon Hussein
@GashonHussein
Stanford
Excited to share our new paper, "One-Minute Video Generation with Test-Time Training (TTT)" in collaboration with NVIDIA. We augment a pre-trained Transformer with TTT-layers and finetune it to generate one-minute Tom and Jerry cartoons with strong temporal and spatial…

n-simplex attention makes incredible sense because of its honesty: it literally says you can put more compute on attention operation to get more gains: we've seen this trend so many times. This differs from lot of 'suspicious' claim, such as you can use less compute to perform…
Our models need to run in real time on real robots, but inference with big VLAs takes a long time. We developed Real-Time Action Chunking (RTC) to enable real-time inference with flow matching for the π0 and π0.5 VLAs! More in the thread👇
Fun project at PI: knowledge insulation for VLAs. We figured out how to train VLAs with cont. actions much more effectively by insulating the VLM and training it with discrete actions, while action expert learns on top. 5-7x faster, and importantly way better language following…
We’re excited to announce Sunflower Capital Funds I and II. Sunflower is a $250m fund that partners at the earliest stage with companies building foundations for modern enterprises, critical industries, and the physical world.
We got a robot to clean up homes that were never seen in its training data! Our new model, π-0.5, aims to tackle open-world generalization. We took our robot into homes that were not in the training data and asked it to clean kitchens and bedrooms. More below⤵️
introducing chipmunk—a training-free algorithm making ai video generation 3.7x & image gen 1.6x faster! ⚡️ our kernels for column-sparse attention are 9.3x faster than FlashAttention-3 and column-sparse GEMM is 2.5x faster vs. cuBLAS a thread on the GPU kernel optimizations 🧵
Our latest joint work w/ SandyResearch @ UCSD: training-free acceleration of Diffusion Transformers w/ dynamic sparsity, led by @austinsilveria @SohamGovande! ⚡️ 3.7x faster video and 1.6x faster image generation while preserving quality! 🧵 Open-source code & CUDA kernels!
I built Orchestrator with @jameszhou02 , a proof of concept for how we envision the future of software engineering. In the future, every engineer will manage swarms of AI engineers that execute their plans in parallel. Orchestrator takes an input prompt and creates a plan that…
One of the neat side effects of initializing from a pre-trained Transformer is that we can generate videos of locations that weren’t in the original Tom and Jerry cartoons. “Around the World” - A 30-second video from earlier in training.
Today, we're releasing a new paper – One-Minute Video Generation with Test-Time Training. We add TTT layers to a pre-trained Transformer and fine-tune it to generate one-minute Tom and Jerry cartoons with strong temporal consistency. Every video below is produced directly by…
AI (using TTT) now creates one minute long videos with one prompt! Researchers have developed a method that can be used to create one-minute videos with particularly fluid movements and high temporal consistency. To do this, they use test-time training (TTT) and integrate…
Excited to share our new paper, "One-Minute Video Generation with Test-Time Training (TTT)" in collaboration with NVIDIA. We augment a pre-trained Transformer with TTT-layers and finetune it to generate one-minute Tom and Jerry cartoons with strong temporal and spatial…
Test-Time Training (TTT) is now on Video! And not just a 5-second video. We can generate a full 1-min video! TTT module is an RNN module that provides an explicit and efficient memory mechanism. It models the hidden state of an RNN with a machine learning model, which is updated…
💙Neo
We loved hosting @SamA for an intimate gathering with @Neo Scholars. Thank you @Alfred_Lin for offering your home! ❤️
Cool to see modern swe-agents taking systems-oriented approaches to reducing large search spaces with different fault localization strategies. Extremely large state+action spaces seemed to be the greatest choke point on the critical path half a year ago
Feel like the traditional approach was to build out the fundamentals of your business by ignoring big launches, iterating through assumptions, and testing your way to pmf. Approach feels outdated in the current landscape.