Karan Dalal
@karansdalal
Berkeley
Today, we're releasing a new paper – One-Minute Video Generation with Test-Time Training. We add TTT layers to a pre-trained Transformer and fine-tune it to generate one-minute Tom and Jerry cartoons with strong temporal consistency. Every video below is produced directly by…
[9 July 2024] Depth is all you need: - Everybody is sleeping on @lilianweng's latest review of Hallucination Detection/Prevention/Evals - @ylecun and @reach_vb on MobileLLM takeaways - Summary of @xiaolonw et al's Test Time Training architecture research. A late issue today due…
Very happy to see the TTT-series reaching yet another milestone! This time it serves as an inspiration for next-generation architecture post-Transformer, and by connecting TTT to Transformer, it can explain why (autoregressive) Transformers are so good at in-context learning!
Cannot believe this finally happened! Over the last 1.5 years, we have been developing a new LLM architecture, with linear complexity and expressive hidden states, for long-context modeling. The following plots show our model trained from Books scale better (from 125M to 1.3B)…
One of the neat side effects of initializing from a pre-trained Transformer is that we can generate videos of locations that weren’t in the original Tom and Jerry cartoons. “Around the World” - A 30-second video from earlier in training.
Today, we're releasing a new paper – One-Minute Video Generation with Test-Time Training. We add TTT layers to a pre-trained Transformer and fine-tune it to generate one-minute Tom and Jerry cartoons with strong temporal consistency. Every video below is produced directly by…
I want to highlight some of the coolest kernel work by @danielkoceja, specifically around making TTT layers fast for training and video generation. TLDR: Our kernel does Tensor Parallel across Streaming Multiprocessors so we can efficiently train an RNN whose hidden state is a…

Test-Time Training (TTT) is now on Video! And not just a 5-second video. We can generate a full 1-min video! TTT module is an RNN module that provides an explicit and efficient memory mechanism. It models the hidden state of an RNN with a machine learning model, which is updated…
Excited to share our new paper, "One-Minute Video Generation with Test-Time Training (TTT)" in collaboration with NVIDIA. We augment a pre-trained Transformer with TTT-layers and finetune it to generate one-minute Tom and Jerry cartoons with strong temporal and spatial…
Test-Time Compute is currently just inference with chain of thought. We haven’t started doing test-time-training - where model updates weights to go figure out new things or ingest a ton of new context, without losing generality and raw IQ. Going to be amazing when that happens.
1X is the leader in humanoids.
Introducing NEO Beta. Designed for humans. Built for the home.
TTT models might be the next frontier in generative AI tcrn.ch/4f4uBHt
This is a great graphic! And also models our choice of parenthesis in the title. Learning to —> Outer loop, or E2E network (Learn at Test Time) —> Inner loop, which trains even during inference.
Highly recommend this paper -- here's a slide that helped me understand it
A brilliant idea, paper, and a master class in writing😍 TLDR: RNN where the state is a neural network. State update is done by learning => inner learning loop = process dynamics, outer learning loop = model training. By Yu Sun, @LeoXinhaoLee, @karansdalal et al (cf below)
🔥new language modelling layer in town - more expressive than RNN - more efficient (linear comp.) than attention! key perspective: LM layers are ML models trained to memorize tokens in a sequence: - Linear memorizer => RNN - Kernel mem. => attention - Neural mem. => our layer
I’m excited to share a project I’ve been working on for over a year, which I believe will fundamentally change our approach to language models. We’ve designed a new architecture, which replaces the hidden state of an RNN with a machine learning model. This model compresses…
TTT could model long sequences with linear time complexity. It's a drop-in upgrade for any sequence modeling operators like self-attention. It has been super fun to work on TTT with the amazing team! Code is available: github.com/test-time-trai…
I’m excited to share a project I’ve been working on for over a year, which I believe will fundamentally change our approach to language models. We’ve designed a new architecture, which replaces the hidden state of an RNN with a machine learning model. This model compresses…