Daniel Yamins
@dyamins
@StanfordAIlab @neuroailab @stanfordbrain
New paper on 3D scene understanding for static images with a novel large-scale video prediction model. neuroailab.github.io/projects/lras_… Strong results in self-supervised depth extraction, novel view synthesis (aka camera control), and complex object manipulations.

Here's a third application of our new world modeling technology - to object grouping. In a sense this completes the video scene understanding trifecta of 3D shape, motion, and now object individualization. From a technical perspective, the core innovation is the idea of…
AI models segment scenes based on how things appear, but babies segment based on what moves together. We utilize a visual world model that our lab has been developing, to capture this concept — and what's cool is that it beats SOTA models on zero-shot segmentation and physical…
AI models segment scenes based on how things appear, but babies segment based on what moves together. We utilize a visual world model that our lab has been developing, to capture this concept — and what's cool is that it beats SOTA models on zero-shot segmentation and physical…
2️⃣ Why PyTorchTNN? Most deep learning frameworks treat recurrence as global. PyTorchTNN lets you flexibly build arbitrary temporal graphs with modular components, where each TNN layer decomposes into: 🔹 Harbor Policy (how inputs combine) 🔹 Pre-/Post-Memory (Conv/Pool/Residual…
Thanks for taking this idea into the future @aran_nayebi et al. Having true recurrent networks like this is really important from a science perspective, so I’m glad it is continuing to be developed!
1️⃣ What is a TNN? TNNs are neural networks with local recurrence or feedback connections, processing inputs across time. Unlike standard RNNs, each time step in TNNs corresponds to a single feedforward layer’s computation to mimic biological processing. Of course, you can also…
🚀 New Open-Source Release! PyTorchTNN 🚀 A PyTorch package for building biologically-plausible temporal neural networks (TNNs)—unrolling neural network computation layer-by-layer through time, inspired by cortical processing. PyTorchTNN naturally integrates into the…
(4/) To discover such segments, we build SpelkeNet: a visual world model based on the recently introduced local random access sequence modeling (LRAS) paradigm: neuroailab.github.io/projects/lras_…. Our model acquires an implicit understanding of “what moves together” in natural scenes by…
This looks interesting
How do people reason so flexibly about new problems, bringing to bear globally-relevant knowledge while staying locally-consistent? Can we engineer a system that can synthesize bespoke world models (expressed as probabilistic programs) on-the-fly?
We are happy to announce an opening for a Tenure Track Assistant Professor Faculty Position in Neuroscience at EPFL. Join our groups working on cellular & circuit neuroscience & neurocomputation - go.epfl.ch/brain. Deadline Oct 1 2025, Apply now - go.epfl.ch/neurofaculty
These are really amazing positions.
We are happy to announce an opening for a Tenure Track Assistant Professor Faculty Position in Neuroscience at EPFL. Join our groups working on cellular & circuit neuroscience & neurocomputation - go.epfl.ch/brain. Deadline Oct 1 2025, Apply now - go.epfl.ch/neurofaculty
Skeptic!
Practically useful & biologically aligned benchmarks such as this one from @pkoo562 lab consistently show that all the overhyped annotation-agnostic DNA language models are actually terrible for transcriptional regulatory DNA in humans (mammals). 1/
Practically useful & biologically aligned benchmarks such as this one from @pkoo562 lab consistently show that all the overhyped annotation-agnostic DNA language models are actually terrible for transcriptional regulatory DNA in humans (mammals). 1/
*Easter egg alert* NOT in the published paper. We also benchmarked Evo 2 and while it did better than other gLMs (consistent that scale can improve gLMs), it still falls short of a basic CNN trained using one-hot sequences and far short of supervised SOTA x.com/pkoo562/status…
Can an AI model predict perfectly and still have a terrible world model? What would that even mean? Our new ICML paper formalizes these questions One result tells the story: A transformer trained on 10M solar systems nails planetary orbits. But it botches gravitational laws 🧵
This enables KL-tracing: tracing a dot through video by computing KL divergence between factual (without dot) and counterfactual (with dot) predictions. This reasons through all possible future states simultaneously, taming the inherent randomness of generative models.
Enter the LRAS (Local Random Access Sequence) model - a generative video model that checks all the boxes. Beyond tight conditioning, it predicts the distribution over ALL possible values at the next patch (like LLMs), capturing the superposition of all probable tracer dot states.
FINALLY: KL-tracing works by computing KL-divergence between clean & perturbed logit distributions. This is a powerful *statistical counterfactual* probe enabled by autoregressive generative predictors (like LRAS).
📷 New Preprint: SOTA optical flow extraction from pre-trained generative video models! While it seems intuitive that video models grasp optical flow, extracting that understanding has proven surprisingly elusive.
We prompt a generative video model to extract state-of-the-art optical flow, using zero labels and no fine-tuning. Our method, KL-tracing, achieves SOTA results on TAP-Vid & generalizes to challenging YouTube clips. @khai_loong_aw @KlemenKotar @CristbalEyzagu2 @lee_wanhee_…
We prompt a generative video model to extract state-of-the-art optical flow, using zero labels and no fine-tuning. Our method, KL-tracing, achieves SOTA results on TAP-Vid & generalizes to challenging YouTube clips. @khai_loong_aw @KlemenKotar @CristbalEyzagu2 @lee_wanhee_…
Super stoked for our Minds in the Making workshop at @cogscisociety.bsky.social 2025! If you are at all interested in the intersection between cognitive science and design, you won’t want to miss it!! 🧠🛠️
Delighted to announce our CogSci '25 workshop at the interface between cognitive science and design 🧠🖌️! We're calling it: Minds in the Making🏺minds-making.github.io Register now! June – July 2024, free & open to the public. (all career stages, all disciplines)