Konpat Ta Preechakul
@konpatp
Learning abstraction from pixels. PhD student at @berkeley_ai. I'm from Bangkok, Thailand 🐘.
Some problems can’t be rushed—they can only be done step by step, no matter how many people or processors you throw at them. We’ve scaled AI by making everything bigger and more parallel: Our models are parallel. Our scaling is parallel. Our GPUs are parallel. But what if the…
Another (more technical) perspective on the Serial Scaling Hypothesis 😉
thread on the new paper: The Serial Scaling Hypothesis joint work with: @phizaz, @YutongBAI1002, Kananart
Come to the poster to hear from @itay__yona about our paper paper paper paper paper paper paper paper :)
Ever felt like you're talking to a parrot with a glitch? 🦜 Turns out, LLMs struggle with repetition in a fascinating way! 🕵️♂️ We reverse-engineered the circuit responsible for that bug 🤯
Navigation World Models won the Best Paper Honorable Mention Award at #CVPR2025 ☺️ It is my first postdoc paper since joining Yann's lab at @AIatMeta, so I am very excited. It was also extremely fun working with @GaoyueZhou, @dans_t123, @trevordarrell (and @ylecun) Fun story:
Congratulations to the #CVPR2025 Honorable Mentions for Best Paper! @GoogleDeepMind, @UCBerkeley, @UMich, @AIatMeta, @nyuniversity, @berkeley_ai, #AllenInstituteforAI, @UW, #UniversityCollegeLondon, @UniversityLeeds, @ZJU_China, @NTUsg, @PKU1898, @Huawei Singapore Research Center
What would a World Model look like if we start from a real embodied agent acting in the real world? It has to have: 1) A real, physically grounded and complex action space—not just abstract control signals. 2) Diverse, real-life scenarios and activities. Or in short: It has to…
Artifacts in your attention maps? Forgot to train with registers? Use 𝙩𝙚𝙨𝙩-𝙩𝙞𝙢𝙚 𝙧𝙚𝙜𝙞𝙨𝙩𝙚𝙧𝙨! We find a sparse set of activations set artifact positions. We can shift them anywhere ("Shifted") — even outside the image into an untrained token. Clean maps, no retrain.
Coming up this week: (oral @CVPR) Do We Always Need the Simplicity Bias? We take another step to understand why/when neural nets generalize so well. ⬇️🧵
We release Search Arena 🌐 — the first large-scale (24k+) dataset of in-the-wild user interactions with search-augmented LLMs. We also share a comprehensive report on user preferences and model performance in the search-enabled setting. Paper, dataset, and code in 🧵
The last paper of my PhD is finally out ! Introducing "Intuitive physics understanding emerges from self-supervised pretraining on natural videos" We show that without any prior, V-JEPA --a self-supervised video model-- develops an understanding of intuitive physics !
Is RL really scalable like other objectives? We found that just scaling up data and compute is *not* enough to enable RL to solve complex tasks. The culprit is the horizon. Paper: arxiv.org/abs/2506.04168 Thread ↓
Next-gen vision pre-trained models shouldn’t be short-sighted. Humans can easily perceive 10K x 10K resolution. But today’s top vision models—like SigLIP and DINOv2—are still pre-trained at merely hundreds by hundreds of pixels, bottlenecking their real-world usage. Today, we…
[1/8] Is scene understanding solved? We can label pixels and detect objects with high accuracy. But does that mean we truly understand scenes? Super excited to share our new paper and a new task in computer vision: Visual Jenga! 📄arxiv.org/abs/2503.21770…
Decentralized Diffusion Models power stronger models trained on more accessible infrastructure. DDMs mitigate the networking bottleneck that locks training into expensive and power-hungry centralized clusters. They scale gracefully to billions of parameters and generate…
Check out the First Workshop on Mech Interp for Vision at @CVPR! Paper submissions: sites.google.com/view/miv-cvpr2…
🔍 Curious about what's really happening inside vision models? Join us at the First Workshop on Mechanistic Interpretability for Vision (MIV) at @CVPR! 📢 Website: sites.google.com/view/miv-cvpr2… Meet our amazing invited speakers! #CVPR2025 #MIV25 #MechInterp #ComputerVision