Klemen Kotar

@KlemenKotar

CS PhD Student at Stanford Neuro AI Lab, building large world models

Stanford, CA

Joined April 2021

311Following

178Followers

Pinned

Klemen Kotar@KlemenKotar · Jul 16

📷 New Preprint: SOTA optical flow extraction from pre-trained generative video models! While it seems intuitive that video models grasp optical flow, extracting that understanding has proven surprisingly elusive.

SSeungwoo (Simon) Kim@SeKim1112 · Jul 15

We prompt a generative video model to extract state-of-the-art optical flow, using zero labels and no fine-tuning. Our method, KL-tracing, achieves SOTA results on TAP-Vid & generalizes to challenging YouTube clips. @khai_loong_aw @KlemenKotar @CristbalEyzagu2 @lee_wanhee_…

12.0K

Pinned

Klemen Kotar@KlemenKotar · 22 h

what are objects, though? seriously, if i ask you to define where one object begins and another one ends would you have a good answer? is my phone case part of my phone? is my shirt part of my body? maybe it is based on whether i can take it apart and put it back together?…

RRahul Venkatesh@Rahul_Venkatesh · Jul 24

AI models segment scenes based on how things appear, but babies segment based on what moves together. We utilize a visual world model that our lab has been developing, to capture this concept — and what's cool is that it beats SOTA models on zero-shot segmentation and physical…

479

Klemen Kotar Retweeted

Mihir Prabhudesai@mihirp98 · Jul 22

🚨 The era of infinite internet data is ending, So we ask: 👉 What’s the right generative modelling objective when data—not compute—is the bottleneck? TL;DR: ▶️Compute-constrained? Train Autoregressive models ▶️Data-constrained? Train Diffusion models Get ready for 🤿 1/n

121

171

966

833

170.0K

Klemen Kotar Retweeted

Seungwoo (Simon) Kim@SeKim1112 · Jul 17

Concurrent work alert! DiffTrack (arxiv.org/abs/2506.17220) (@jisu__nam, @JunhwaHur, @KimSeungry62571, et al.) is a super cool paper that tackles the same puzzle we do: can you pull out useful signals from a generative video model with zero labels? Their trick is to probe…

572

Klemen Kotar Retweeted

Kempner Institute at Harvard University@KempnerInst · Jul 16

Thrilled to announce the 2025 recipients of #KempnerInstitute Research Fellowships: Elom Amemastro, Ruojin Cai, David Clark, Alexandru Damian, William Dorrell, Mark Goldstein, Richard Hakim, Hadas Orgad, Gizem Ozdil, Gabriel Poesia, & Greta Tuckute! bit.ly/3IpzD5E

18.0K

Klemen Kotar Retweeted

Andrej Karpathy@karpathy · Jul 5

How to build a thriving open source community by writing code like bacteria do 🦠. Bacterial code (genomes) are: - small (each line of code costs energy) - modular (organized into groups of swappable operons) - self-contained (easily "copy paste-able" via horizontal gene…

370

1.0K

9.0K

5.0K

572.0K

Klemen Kotar Retweeted

Yunzhi Zhang@zhang_yunzhi · Jun 14

(1/n) Time to unify your favorite visual generative models, VLMs, and simulators for controllable visual generation—Introducing a Product of Experts (PoE) framework for inference-time knowledge composition from heterogeneous models.

302

218

44.0K

Klemen Kotar@KlemenKotar · Jun 8

When technology speaks with warmth and flow, it goes beyond feeling like a tool and starts feeling like a human friend. When Advanced Voice was launched, I remember being impressed by how good it sounded. I never imagined that nine months later, as my first project since…

OOpenAI@OpenAI · Jun 7

We launched an update to Advanced Voice to make it way more natural and effortless to talk to. Now available to all paid users in ChatGPT.

5.0K

Klemen Kotar@KlemenKotar · May 27

@aran_nayebi This is super cool. @ChengxuZhuang and I tried something like this in one of the first papers out of my lab … it’s really nice to see this take a great next step. And with real data!

AAran Nayebi@aran_nayebi · May 27

Check out our new work exploring how to make robots sense touch more like our brains! Surprisingly, ConvRNNs aligned best with mouse somatosensory cortex and even passed the NeuroAI Turing Test on current neural data. We also developed new tactile-specific augmentations for…

528

Klemen Kotar@KlemenKotar · May 23

very cool work!

GGreta Tuckute@GretaTuckute · May 23

What are the organizing dimensions of language processing? We show that voxel responses are organized along 2 main axes: processing difficulty & meaning abstractness—revealing an interpretable, topographic representational basis for language processing shared across individuals.

192

Klemen Kotar@KlemenKotar · May 21

Impressive results! This paper incorporates so many of my favorite things: representational convergence, GANs, cycle-consistency, unpaired translation, etc.

jjxmo@jxmnop · May 21

excited to finally share on arxiv what we've known for a while now: All Embedding Models Learn The Same Thing embeddings from different models are SO similar that we can map between them based on structure alone. without *any* paired data feels like magic, but it's real:🧵

199

110

24.0K

Klemen Kotar Retweeted

Sam Rodriques@SGRodriques · May 20

Today, we’re announcing the first major discovery made by our AI Scientist with the lab in the loop: a promising new treatment for dry AMD, a major cause of blindness. Our agents generated the hypotheses, designed the experiments, analyzed the data, iterated, even made figures…

115

717

4.0K

2.0K

1.1M