Yipeng Zhang
@yipengzz
Preprint Alert 🚀 Multi-agent reinforcement learning (MARL) often assumes that agents know when other agents cooperate with them. But for humans, this isn’t always true. Example, plains indigenous groups used to leave resources for others to use at effigies called Manitokan. 1/8
Super stoked to share my first first-author paper that introduces a hybrid architecture approach for real-time neural decoding. It's been a lot of work, but happy to showcase some very cool results!
New preprint! 🧠🤖 How do we build neural decoders that are: ⚡️ fast enough for real-time use 🎯 accurate across diverse tasks 🌍 generalizable to new sessions, subjects, and species? We present POSSM, a hybrid SSM architecture that optimizes for all three of these axes! 🧵1/7
Is there a universal strategy to turn any generative model—GANs, VAEs, diffusion models, or flows—into a conditional sampler, or finetuned to optimize a reward function? Yes! Outsourced Diffusion Sampling (ODS) accepted to @icmlconf , does exactly that!
Is AdamW the best inner optimizer for DiLoCo? Does the inner optimizer affect the compressibility of the DiLoCo delta? Excited to introduce MuLoCo: Muon is a practical inner optimizer for DiLoCo! 🧵arxiv.org/abs/2505.23725 1/N
🚨 Preprint Alert 🚀 📄 seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models arxiv.org/abs/2505.03176 Can we simultaneously learn both transformation-invariant and transformation-equivariant representations with self-supervised learning (SSL)?…
How can we make recommender systems more transparent and controllable without sacrificing quality? Introducing TEARS a scrutable RS that replaces numerical user profiles with editable text summaries (accepted @TheWebConf). arxiv.org/abs/2410.19302 1/🧵
Learned optimizers can’t generalize to large unseen tasks…. Until now! Excited to present μLO: Compute-Efficient Meta-Generalization of Learned Optimizers! Don’t miss my talk about it next Sunday at the OPT2024 Neurips Workshop :) 🧵arxiv.org/abs/2406.00153 1/N
Introducing a framework for end-to-end discovery of data structures—no predefined algorithms or hand-tuning needed. Work led by Omar Salemohamed. More details below. arxiv.org/abs/2411.03253
Come study with us at Mila! I will be looking for new students to work with. Our current projects explore continual learning, modularity, scrutability, algorithm discovery, AI for law (reasoning), invariances, and decision-making...
Mila's annual supervision request process opens on October 15 to receive MSc and PhD applications for Fall 2025 admission! Join our community! More information here mila.quebec/en/prospective…
Happy to share the first paper of my master's! Big kudos to my very cool co-authors: @zek3r, @tomjiralerspong, Alex Payeur, @mattperich, Luca Mazzucato, and @g_lajoie_
Mila's annual supervision request process opens on October 15 to receive MSc and PhD applications for Fall 2025 admission! Join our community! More information here mila.quebec/en/prospective…
Can we perform unbiased bayesian posterior inference with a diffusion model prior? We propose Relative Trajectory Balance (RTB) which allows us to directly optimize for this posterior model. We apply this to several tasks in image, language and control!🧵arxiv.org/abs/2405.20971
How can we generate interesting edge cases to test our autonomous vehicles in simulation? We propose CtRL-Sim, a novel framework for closed-loop behaviour simulation that enables fine-grained control over agent behaviours. 🧵 1/8 arxiv.org/abs/2403.19918