Rahul Venkatesh (@Rahul_Venkatesh)

Pinned

R

Rahul Venkatesh@Rahul_Venkatesh · Jul 24

AI models segment scenes based on how things appear, but babies segment based on what moves together. We utilize a visual world model that our lab has been developing, to capture this concept — and what's cool is that it beats SOTA models on zero-shot segmentation and physical…

6

14

51

20

104.0K

Pinned

R

Rahul Venkatesh@Rahul_Venkatesh · Jul 24

Very interesting... a way of thinking about scene segmentation that is both more functional for robotics and also category-agnostic (which is more cognitively-grounded as babies get few category labels), and new models!

RRahul Venkatesh@Rahul_Venkatesh · Jul 24

AI models segment scenes based on how things appear, but babies segment based on what moves together. We utilize a visual world model that our lab has been developing, to capture this concept — and what's cool is that it beats SOTA models on zero-shot segmentation and physical…

0

1

7

0

322

Pinned

R

Rahul Venkatesh@Rahul_Venkatesh · Jul 24

Amazing to see all the things @NeuroAILab is doing with counterfactuals + a single "pure" vision foundation model, LRAS. Self-supervised segmentation is my favorite. It gets at a deep philosophical question: what is an object, anyway?

RRahul Venkatesh@Rahul_Venkatesh · Jul 24

AI models segment scenes based on how things appear, but babies segment based on what moves together. We utilize a visual world model that our lab has been developing, to capture this concept — and what's cool is that it beats SOTA models on zero-shot segmentation and physical…

0

1

9

0

541

Pinned

R

Rahul Venkatesh@Rahul_Venkatesh · Jul 24

Here's a third application of our new world modeling technology - to object grouping. In a sense this completes the video scene understanding trifecta of 3D shape, motion, and now object individualization. From a technical perspective, the core innovation is the idea of…

RRahul Venkatesh@Rahul_Venkatesh · Jul 24

AI models segment scenes based on how things appear, but babies segment based on what moves together. We utilize a visual world model that our lab has been developing, to capture this concept — and what's cool is that it beats SOTA models on zero-shot segmentation and physical…

0

5

20

6

2.0K

Pinned

R

Rahul Venkatesh@Rahul_Venkatesh · Jul 25

what are objects, though? seriously, if i ask you to define where one object begins and another one ends would you have a good answer? is my phone case part of my phone? is my shirt part of my body? maybe it is based on whether i can take it apart and put it back together?…

RRahul Venkatesh@Rahul_Venkatesh · Jul 24

AI models segment scenes based on how things appear, but babies segment based on what moves together. We utilize a visual world model that our lab has been developing, to capture this concept — and what's cool is that it beats SOTA models on zero-shot segmentation and physical…

0

1

9

0

525

Pinned

R

Rahul Venkatesh@Rahul_Venkatesh · Jul 16

Over the past 18 months my lab has been developing a new approach to visual world modeling. There will be a magnum opus that ties it all together out in the next couple of weeks. But for now there are some individual application papers that have poked out.

KKlemen Kotar@KlemenKotar · Jul 16

📷 New Preprint: SOTA optical flow extraction from pre-trained generative video models! While it seems intuitive that video models grasp optical flow, extracting that understanding has proven surprisingly elusive.

1

14

74

35

7.0K

R

Rahul Venkatesh@Rahul_Venkatesh · Jul 16

📷 New Preprint: SOTA optical flow extraction from pre-trained generative video models! While it seems intuitive that video models grasp optical flow, extracting that understanding has proven surprisingly elusive.

SSeungwoo (Simon) Kim@SeKim1112 · Jul 15

We prompt a generative video model to extract state-of-the-art optical flow, using zero labels and no fine-tuning. Our method, KL-tracing, achieves SOTA results on TAP-Vid & generalizes to challenging YouTube clips. @khai_loong_aw @KlemenKotar @CristbalEyzagu2 @lee_wanhee_…

1

8

40

7

12.0K

Rahul Venkatesh Retweeted

S

Seungwoo (Simon) Kim@SeKim1112 · Jul 15

We prompt a generative video model to extract state-of-the-art optical flow, using zero labels and no fine-tuning. Our method, KL-tracing, achieves SOTA results on TAP-Vid & generalizes to challenging YouTube clips. @khai_loong_aw @KlemenKotar @CristbalEyzagu2 @lee_wanhee_…

1

8

29

9

5.0K

R

Rahul Venkatesh@Rahul_Venkatesh · Apr 8

🚀 Excited to share our new paper! We introduce the first autoregressive model that natively handles: 🎥 Novel view synthesis 🎨 Interactive 3D object editing 📏 Depth extraction ➕ and more! No fine-tuning needed—just prompting. Outperforming even diffusion-based methods!

KKlemen Kotar@KlemenKotar · Jul 16

📷 New Preprint: SOTA optical flow extraction from pre-trained generative video models! While it seems intuitive that video models grasp optical flow, extracting that understanding has proven surprisingly elusive.

2

5

19

1

2.0K