Klemen Kotar
@KlemenKotar
CS PhD Student at Stanford Neuro AI Lab, building large world models
📷 New Preprint: SOTA optical flow extraction from pre-trained generative video models! While it seems intuitive that video models grasp optical flow, extracting that understanding has proven surprisingly elusive.
We prompt a generative video model to extract state-of-the-art optical flow, using zero labels and no fine-tuning. Our method, KL-tracing, achieves SOTA results on TAP-Vid & generalizes to challenging YouTube clips. @khai_loong_aw @KlemenKotar @CristbalEyzagu2 @lee_wanhee_…
what are objects, though? seriously, if i ask you to define where one object begins and another one ends would you have a good answer? is my phone case part of my phone? is my shirt part of my body? maybe it is based on whether i can take it apart and put it back together?…
AI models segment scenes based on how things appear, but babies segment based on what moves together. We utilize a visual world model that our lab has been developing, to capture this concept — and what's cool is that it beats SOTA models on zero-shot segmentation and physical…
🚨 The era of infinite internet data is ending, So we ask: 👉 What’s the right generative modelling objective when data—not compute—is the bottleneck? TL;DR: ▶️Compute-constrained? Train Autoregressive models ▶️Data-constrained? Train Diffusion models Get ready for 🤿 1/n
Concurrent work alert! DiffTrack (arxiv.org/abs/2506.17220) (@jisu__nam, @JunhwaHur, @KimSeungry62571, et al.) is a super cool paper that tackles the same puzzle we do: can you pull out useful signals from a generative video model with zero labels? Their trick is to probe…
Thrilled to announce the 2025 recipients of #KempnerInstitute Research Fellowships: Elom Amemastro, Ruojin Cai, David Clark, Alexandru Damian, William Dorrell, Mark Goldstein, Richard Hakim, Hadas Orgad, Gizem Ozdil, Gabriel Poesia, & Greta Tuckute! bit.ly/3IpzD5E
How to build a thriving open source community by writing code like bacteria do 🦠. Bacterial code (genomes) are: - small (each line of code costs energy) - modular (organized into groups of swappable operons) - self-contained (easily "copy paste-able" via horizontal gene…
(1/n) Time to unify your favorite visual generative models, VLMs, and simulators for controllable visual generation—Introducing a Product of Experts (PoE) framework for inference-time knowledge composition from heterogeneous models.
When technology speaks with warmth and flow, it goes beyond feeling like a tool and starts feeling like a human friend. When Advanced Voice was launched, I remember being impressed by how good it sounded. I never imagined that nine months later, as my first project since…
We launched an update to Advanced Voice to make it way more natural and effortless to talk to. Now available to all paid users in ChatGPT.
@aran_nayebi This is super cool. @ChengxuZhuang and I tried something like this in one of the first papers out of my lab … it’s really nice to see this take a great next step. And with real data!
Check out our new work exploring how to make robots sense touch more like our brains! Surprisingly, ConvRNNs aligned best with mouse somatosensory cortex and even passed the NeuroAI Turing Test on current neural data. We also developed new tactile-specific augmentations for…
very cool work!
What are the organizing dimensions of language processing? We show that voxel responses are organized along 2 main axes: processing difficulty & meaning abstractness—revealing an interpretable, topographic representational basis for language processing shared across individuals.
Impressive results! This paper incorporates so many of my favorite things: representational convergence, GANs, cycle-consistency, unpaired translation, etc.
excited to finally share on arxiv what we've known for a while now: All Embedding Models Learn The Same Thing embeddings from different models are SO similar that we can map between them based on structure alone. without *any* paired data feels like magic, but it's real:🧵
Today, we’re announcing the first major discovery made by our AI Scientist with the lab in the loop: a promising new treatment for dry AMD, a major cause of blindness. Our agents generated the hypotheses, designed the experiments, analyzed the data, iterated, even made figures…