Avery Ma
@avery__ma
A renowned researcher in the field just stopped by my poster and we chatted. One of the best moment of my career so far.
We often use #VAML/ #MuZero losses with deterministic models. But if we want stochastic models to measure uncertainty or to leverage current SOTA models such as #transformers and #diffusion, we need to take care! Naively translating the loss functions leads to mistakes!
Would you be surprised that many empirical implementations of value-aware model learning (VAML) algos, including MuZero, lead to incorrect model & value functions when training stochastic models 🤕? In our new @icml_conf 2025 paper, we show why this happens and how to fix it 🦾!
🎉Good news, everyone! 🎉 I will recruit graduate students on the algorithmic and theoretical aspects of Reinforcement Learning. You will join Adage, @Mila_Quebec, and @polymtl. More info on why and how you should apply: academic.sologen.net/2024/11/22/gra… Deadline: Dec 1st
I’ll be presenting our work on understanding the robustness difference between models trained via different optimizers at @iclr_conf. Visit our poster (Friday 4:30-6:30 Halle B #101) to learn about the pitfall of adaptive gradient methods. #ICLR2024 Paper: arxiv.org/abs/2308.06703
"Without a perfect model, model-based RL is hopeless!" Our paper at #ICLR2024 challenges this belief! Even an inaccurate model can help a lot. Don’t throw it away! Title: Maximum Entropy Model Correction in Reinforcement Learning Paper: openreview.net/forum?id=kNpSU… 🧵(1/7)
Blog: Is Your Neural Network at Risk? The Pitfall of Adaptive Gradient Optimizers Summary: Models trained using SGD exhibit significantly higher robustness to input perturbations than those trained via adaptive gradient methods such as Adam or RMSProp. vectorinstitute.ai/is-your-neural…
Another paper rejected, CVPR review, GPT-suspected, AC inaction, disappointed, Innovation, undetected, To ECCV, resubmitted.
Did you know that the optimizer significantly affects the robustness of NN? And Adam is the wrong answer!😯 "Understanding the robustness difference between SGD and adaptive gradient methods” dives deep into this. Paper: openreview.net/forum?id=ed8Sk… Code: github.com/averyma/opt-ro… 🧵1/4