Avery Ryoo
@averyryoo
phd @Mila_Quebec, prev. @UWaterloo i like generative models, science, and toronto sports teams 🇨🇦🇰🇷
Super stoked to share my first first-author paper that introduces a hybrid architecture approach for real-time neural decoding. It's been a lot of work, but happy to showcase some very cool results!
New preprint! 🧠🤖 How do we build neural decoders that are: ⚡️ fast enough for real-time use 🎯 accurate across diverse tasks 🌍 generalizable to new sessions, subjects, and species? We present POSSM, a hybrid SSM architecture that optimizes for all three of these axes! 🧵1/7
🧵 Everyone is chasing new diffusion models—but what about the representations they model from? We introduce Discrete Latent Codes (DLCs): - Discrete representation for diffusion models - Uncond. gen. SOTA FID (1.59 on ImageNet) - Compositional generation - Integrates with LLM 🧱
As #ICML2025 kicks off in Vancouver, our AI talent is being quietly pushed out. 🇨🇦 We've been waiting 28 months for permanent residency, but @CitImmCanada won’t budge. Please read and share our story facebook.com/share/p/1AwU2f… linkedin.com/posts/gbxhuang… #IRCC #AI #Immigration #AI
“do not disturb” is a double negative “turb” is much more elegant
Explicit latents or implicit marginalization? at #ICML2025 📌 Tue, 11 am 📍East Exhibition Hall A-B (E-1603) Come check out surprising results on whether explicitly incentivizing learning of correct latents improves generalization over implicitly marginalizing it!
Step 1: Understand how scaling improves LLMs. Step 2: Directly target underlying mechanism. Step 3: Improve LLMs independent of scale. Profit. In our ACL 2025 paper we look at Step 1 in terms of training dynamics. Project: mirandrom.github.io/zsl Paper: arxiv.org/pdf/2506.05447
Excited to announce the Foundation Models for the Brain and Body workshop at #NeurIPS2025!🧠 We invite short papers or interactive demos on AI for neural, physiological or behavioral data. Submit by Aug 22 👉 brainbodyfm-workshop.github.io
(1/n) Sampling from the Boltzmann density better than Molecular Dynamics (MD)? It is possible with PITA 🫓 Progressive Inference Time Annealing! A spotlight @genbio_workshop of @icmlconf 2025! PITA learns from "hot," easy-to-explore molecular states 🔥 and then cleverly "cools"…
The worst part about the Panthers winning the Stanley Cup in 6 games is that Leafs fans will believe this makes them the 2nd best team
big fan of this line of work combining diffusion models at inference time - not an expert but the modularity + lack of extra training feels like an intuitive and elegant approach to more controllable generation
Why do we keep sampling from the same distribution the model was trained on? We rethink this old paradigm by introducing Feynman-Kac Correctors (FKCs) – a flexible framework for controlling the distribution of samples at inference time in diffusion models! Without re-training…
🚀 Our method, Poutine, was the best-performing entry in the 2025 Waymo Vision-based End-to-End Driving Challenge at #CVPR2025! Our 3 B-parameter VLM Poutine scored 7.99 RFS on the official test set—comfortably ahead of every other entry (see figure).
Excited that our paper "Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization" was accepted to ICML 2025! We show how Preference Optimization can reduce the impact of noisy concept labels in CBMs. 🧵/9
(1/n)🚨You can train a model solving DFT for any geometry almost without training data!🚨 Introducing Self-Refining Training for Amortized Density Functional Theory — a variational framework for learning a DFT solver that predicts the ground-state solutions for different…
🚗💥Introducing Ctrl-Crash: controllable video generation for autonomous driving! SOTA models struggle to generate physically realistic car crashes. We propose an image2video diffusion model with bounding box and crash type control. Website: anthonygosselin.github.io/Ctrl-Crash-Pro… 🧵->
Preprint Alert 🚀 Multi-agent reinforcement learning (MARL) often assumes that agents know when other agents cooperate with them. But for humans, this isn’t always true. Example, plains indigenous groups used to leave resources for others to use at effigies called Manitokan. 1/8
Is AdamW the best inner optimizer for DiLoCo? Does the inner optimizer affect the compressibility of the DiLoCo delta? Excited to introduce MuLoCo: Muon is a practical inner optimizer for DiLoCo! 🧵arxiv.org/abs/2505.23725 1/N
Is there a universal strategy to turn any generative model—GANs, VAEs, diffusion models, or flows—into a conditional sampler, or finetuned to optimize a reward function? Yes! Outsourced Diffusion Sampling (ODS) accepted to @icmlconf , does exactly that!
It's very difficult to improve the *exponent* in scaling laws for loss vs compute, especially by changing the optimizer! Our new paper shows that scaling momentum correctly can *provably* improve the scaling exponent on a theoretical model. Empirically, it works on LSTMs too!