Karan Dalal (@karansdalal)

Pinned

K

Karan Dalal@karansdalal · Apr 7

Today, we're releasing a new paper – One-Minute Video Generation with Test-Time Training. We add TTT layers to a pre-trained Transformer and fine-tune it to generate one-minute Tom and Jerry cartoons with strong temporal consistency. Every video below is produced directly by…

187

939

6.0K

3.0K

1.4M

Pinned

Karan Dalal Retweeted

A

AI News by Smol AI@Smol_AI · Jul 10, 2024

[9 July 2024] Depth is all you need: - Everybody is sleeping on @lilianweng's latest review of Hallucination Detection/Prevention/Evals - @ylecun and @reach_vb on MobileLLM takeaways - Summary of @xiaolonw et al's Test Time Training architecture research. A late issue today due…

1

4

20

17

34.0K

Pinned

K

Karan Dalal@karansdalal · Jul 8, 2024

Very happy to see the TTT-series reaching yet another milestone! This time it serves as an inspiration for next-generation architecture post-Transformer, and by connecting TTT to Transformer, it can explain why (autoregressive) Transformers are so good at in-context learning!

XXiaolong Wang@xiaolonw · Jul 8, 2024

Cannot believe this finally happened! Over the last 1.5 years, we have been developing a new LLM architecture, with linear complexity and expressive hidden states, for long-context modeling. The following plots show our model trained from Books scale better (from 125M to 1.3B)…

0

11

100

21

26.0K

K

Karan Dalal@karansdalal · Apr 8

One of the neat side effects of initializing from a pre-trained Transformer is that we can generate videos of locations that weren’t in the original Tom and Jerry cartoons. “Around the World” - A 30-second video from earlier in training.

KKaran Dalal@karansdalal · Apr 7

Today, we're releasing a new paper – One-Minute Video Generation with Test-Time Training. We add TTT layers to a pre-trained Transformer and fine-tune it to generate one-minute Tom and Jerry cartoons with strong temporal consistency. Every video below is produced directly by…

11

32

271

67

34.0K

K

Karan Dalal@karansdalal · Apr 7

I want to highlight some of the coolest kernel work by @danielkoceja, specifically around making TTT layers fast for training and video generation. TLDR: Our kernel does Tensor Parallel across Streaming Multiprocessors so we can efficiently train an RNN whose hidden state is a…

karansdalal's tweet image. I want to highlight some of the coolest kernel work by @danielkoceja, specifically around making TTT layers fast for training and video generation.

TLDR: Our kernel does Tensor Parallel across Streaming Multiprocessors so we can efficiently train an RNN whose hidden state is a…

3

23

159

87

16.0K

Karan Dalal Retweeted

X

Xiaolong Wang@xiaolonw · Apr 7

Test-Time Training (TTT) is now on Video! And not just a 5-second video. We can generate a full 1-min video! TTT module is an RNN module that provides an explicit and efficient memory mechanism. It models the hidden state of an RNN with a machine learning model, which is updated…

33

179

1.0K

742

184.0K

Karan Dalal Retweeted

G

Gashon Hussein@GashonHussein · Apr 7

Excited to share our new paper, "One-Minute Video Generation with Test-Time Training (TTT)" in collaboration with NVIDIA. We augment a pre-trained Transformer with TTT-layers and finetune it to generate one-minute Tom and Jerry cartoons with strong temporal and spatial…

27

160

934

683

206.0K

Karan Dalal Retweeted

A

Aravind Srinivas@AravSrinivas · Jan 30

Test-Time Compute is currently just inference with chain of thought. We haven’t started doing test-time-training - where model updates weights to go figure out new things or ingest a ton of new context, without losing generality and raw IQ. Going to be amazing when that happens.

104

145

2.0K

472

151.0K

K

Karan Dalal@karansdalal · Aug 30

1X is the leader in humanoids.

11X@1x_tech · Aug 30

Introducing NEO Beta. Designed for humans. Built for the home.

4

2

102

10

16.0K

Karan Dalal Retweeted

T

TechCrunch@TechCrunch · Jul 17, 2024

TTT models might be the next frontier in generative AI tcrn.ch/4f4uBHt

0

9

19

8

27.0K

K

Karan Dalal@karansdalal · Jul 12, 2024

This is a great graphic! And also models our choice of parenthesis in the title. Learning to —> Outer loop, or E2E network (Learn at Test Time) —> Inner loop, which trains even during inference.

CChandan Singh@csinva · Jul 12, 2024

Highly recommend this paper -- here's a slide that helped me understand it

0

11

9

3.0K

Karan Dalal Retweeted

C

Christian Wolf (🦋🦋🦋)@chriswolfvision · Jul 11, 2024

A brilliant idea, paper, and a master class in writing😍 TLDR: RNN where the state is a neural network. State update is done by learning => inner learning loop = process dynamics, outer learning loop = model training. By Yu Sun, @LeoXinhaoLee, @karansdalal et al (cf below)

6

46

349

334

44.0K

K

Karan Dalal@karansdalal · Jul 9, 2024

🔥new language modelling layer in town - more expressive than RNN - more efficient (linear comp.) than attention! key perspective: LM layers are ML models trained to memorize tokens in a sequence: - Linear memorizer => RNN - Kernel mem. => attention - Neural mem. => our layer

KKaran Dalal@karansdalal · Jul 8, 2024

I’m excited to share a project I’ve been working on for over a year, which I believe will fundamentally change our approach to language models. We’ve designed a new architecture, which replaces the hidden state of an RNN with a machine learning model. This model compresses…

5

34

166

122

42.0K

K

Karan Dalal@karansdalal · Jul 8, 2024

TTT could model long sequences with linear time complexity. It's a drop-in upgrade for any sequence modeling operators like self-attention. It has been super fun to work on TTT with the amazing team! Code is available: github.com/test-time-trai…

KKaran Dalal@karansdalal · Jul 8, 2024

I’m excited to share a project I’ve been working on for over a year, which I believe will fundamentally change our approach to language models. We’ve designed a new architecture, which replaces the hidden state of an RNN with a machine learning model. This model compresses…

1

14

67

25

15.0K