Valentin De Bortoli
@ValentinDeBort1
Research scientist at DeepMind London.
Ive just learned so much from this playground today. For example, here is what happens if you do gradient descent on relu MLP, even with muP setup. This shows how model is optimizing on last layers for most part, once it gets it, early layers gets updated(meaningful signal…
ReLU MLP with width / depth going to infinity. Note how different parameterization makes pathlogical scaling behavior (yellow / blue on activations / gradients of the weight). muP solves this.
Interested in some foundation aspects? Waiting or unhappy about NeurIPS reviews? Plz consider NeurIPS workshop DynaFront: Dynamics at the Frontiers of Optimization, Sampling, and Games sites.google.com/view/dynafront… @yuejiec @Andrea__M @btreetaiji @T_Chavdarova ++ Sponsor appreciated!
Thrilled to finally release this study! 🚀 We view (discrete) diffusion models as implicitly doing data augmentation over autoregressive. Through this lens, we find that diffusion outperforms AR in data-constrained settings, but it requires larger models and way more epochs to…
🚨 The era of infinite internet data is ending, So we ask: 👉 What’s the right generative modelling objective when data—not compute—is the bottleneck? TL;DR: ▶️Compute-constrained? Train Autoregressive models ▶️Data-constrained? Train Diffusion models Get ready for 🤿 1/n
If still around #ICML2025, plz consider checking out my collaborator @qu_1006 's Oral in the MemFM Workshop, 11am Sat West Meeting Room 223-224, on A Closer Look at Model Collapse (in diffusion model): From a Generalization-to-Memorization Perspective
I won’t be at ICML but check out accepted papers with folks from my former team at MLR - Projective Composition of Diffusion Models led by @ArwenBradley and @PreetumNakkiran icml.cc/virtual/2025/p…
I’m at #ICML2025 this week! Will be presenting Mechanism of Projective Composition of Diffusion Models tomorrow afternoon. Stop by poster E-3105 to see oil paintings of @PreetumNakkiran’s dog!
There are many great researchers out there. But the ones that really stand out to me are the ones who are also kind, even when they don't need to be.
Accelerated Diffusion Models via Speculative Sampling, at #icml25 ! at 16:30 Tuesday July 15 poster E-3012 arxiv.org/abs/2501.05370 @ValentinDeBort1 @agalashov @ArnaudDoucet1
Attending ICML ✈️Tues-Fri to present "The Diffusion Duality" 🗓️Wed, July 16 @ 4:30pm 📍East Exhibition Hall A-B (E-3003) DM if you want to chat about diffusion LMs, or my current work on Duality or Esoteric LMs! x.com/ssahoo_/status…
🚨 “The Diffusion Duality” is out! @ICML2025 ⚡️ Few-step generation in discrete diffusion language models by exploiting the underlying Gaussian diffusion. 🦾Beats AR on 3/7 zero-shot likelihood benchmarks. 📄 Paper: arxiv.org/abs/2506.10892 💻 Code: github.com/s-sahoo/duo 🧠…
a stark demonstration that a Caltech PhD and partnership at a top-tier venture capital firm are no remedy for profound ignorance
It’s not that people think calculus or math is useless in AI. They’re just tired of theory folks who never touch code, never scale a model, and still argue they’re solving problems in AI:) If theory becomes detached from practice, the world will treat it like noise and that’s on…
See below on what Zuckerberg is looking for in star recruits worth $100m pay packages for Meta’s plans in Artificial Intelligence. But weren’t some people saying calculus is no longer useful in the AI age? 🤔
I am very happy to share Orbformer, a foundation model for wavefunctions using deep QMC that offers a route to tackle strongly correlated quantum states! arxiv.org/abs/2506.19960
Small plug, not really advertised but we similarly showed how to perform temperature based control and composition of separately trained diffusion models via SMC and the Feynman Kac model formalism, with score distillation of the energy at AISTATS last year - Diversity control…
Why do we keep sampling from the same distribution the model was trained on? We rethink this old paradigm by introducing Feynman-Kac Correctors (FKCs) – a flexible framework for controlling the distribution of samples at inference time in diffusion models! Without re-training…
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models @ChrisWu6080 @RuiqiGao @poolio @alextrevith ChangxiZheng @jon_barron @holynski_
Excited to share our recent work on how we can flexible combine visual generative models, VLMs and simulators for visual synthesis! This enables physics engine controlled video generation, graphics engine controlled image generation, and compositional image synthesis!
(1/n) Time to unify your favorite visual generative models, VLMs, and simulators for controllable visual generation—Introducing a Product of Experts (PoE) framework for inference-time knowledge composition from heterogeneous models.
Really enjoyed the “Visual Generative Modeling: what’s after diffusion?” workshop @CVPR today. Highly recommended.
[1/9]🚀Excited to share our new work, RNE! A plug-and-play framework for everything about diffusion model density and control: density estimation, inference-time control & scaling, energy regularisation. More details👇 Joint work with @jmhernandez233 @YuanqiD, Francisco Vargas
🚀🚀🚀 Sharing a new exciting work! Nature teaches us to think not only in forward, but also in backward. The counterintuitive backward process gives so much strength, not only in understanding out-of-equilibrium process, but also controlling and estimating with diffusion models!
[1/9]🚀Excited to share our new work, RNE! A plug-and-play framework for everything about diffusion model density and control: density estimation, inference-time control & scaling, energy regularisation. More details👇 Joint work with @jmhernandez233 @YuanqiD, Francisco Vargas