Enea Monzio Compagnoni @ Flexion Robotics

@EneaMC

PhD Student in Stochastic Optimization for Deep Learning @ the University of Basel. I smash calculations until it is night. Past: UBS; Yahoo! Research.

Zurich

Joined May 2022

210Following

127Followers

Pinned

Enea Monzio Compagnoni @ Flexion Robotics@EneaMC · Mar 12

We asked SDEs for wisdom. They said: ‘DSignSGD = Chad, DCSGD = Sad.’💀🔥 #Oral #AISTATS2025 Noise hits compression differently! 📉 DCSGD crumbles under large & heavy-tailed noise. 💪 DSignSGD? Still rocks. 📜 Scaling rules for Distributed Learning!👇 arxiv.org/abs/2502.17009

EneaMC's tweet image. We asked SDEs for wisdom. They said: ‘DSignSGD = Chad, DCSGD = Sad.’💀🔥 #Oral #AISTATS2025

Noise hits compression differently!

📉 DCSGD crumbles under large &amp; heavy-tailed noise.
💪 DSignSGD? Still rocks.
📜 Scaling rules for Distributed Learning!👇

arxiv.org/abs/2502.17009

1.0K

Pinned

Enea Monzio Compagnoni @ Flexion Robotics Retweeted

Antonio Orvieto@orvieto_antonio · Jul 16

As @micahgoldblum and coauthors, we also found that small batches make SGD effective in LM training. It's cool that our papers came out around the same time, and each has a different perspective! Below, our take on why this happens. Our awesome team: @teodorasrec @jonasgeiping…

143

8.0K

Enea Monzio Compagnoni @ Flexion Robotics Retweeted

Antonio Orvieto@orvieto_antonio · Jul 17

Come to HilD tomorrow @ICML2025 ! We have 4 posters on optimization: - In Search of Adam’s Secret Sauce - Is your batch size the problem? Revisiting the Adam-SGD gap in language modeling - On the Interaction of Noise, Compression Role, and Adaptivity under (L0,L1)-Smoothness…

2.0K

Enea Monzio Compagnoni @ Flexion Robotics Retweeted

Antonio Orvieto@orvieto_antonio · Jul 16

To make things elegant (but later we will have intuition), let us talk about the SDE approximation of SGD and SignSGD (a good model for Adam). The SignSGD result is by @EneaMC (arxiv.org/abs/2411.15958), a must-read if you want to understand how adaptive methods react to noise.

2.0K

Enea Monzio Compagnoni @ Flexion Robotics Retweeted

Seth Karten@sethkarten · Jul 14

🚀 Launch day! The NeurIPS 2025 PokéAgent Challenge is live. Two tracks: ① Showdown Battling – imperfect-info, turn-based strategy ② Pokemon Emerald Speedrunning – long horizon RPG planning 5 M labeled replays • starter kit • baselines. Bring your LLM, RL, or hybrid…

145

42.0K

Enea Monzio Compagnoni @ Flexion Robotics@EneaMC · Jul 12

Pass by if you want to know about scaling up your model under distribution shifts of the training data. Take away: muP needs to be tuned to the optimal amount of feature learning that optimizes the forgetting/plasticity trade off.

JJacopo Graldi@JGraldi · Jul 12

🚨 Excited to present our new paper at 🇨🇦 #ICML2025! 🚨 "The Importance of Being Lazy: Scaling Limits of Continual Learning" Great collab with @alebreccia99, @glanzillo11 , Thomas Hofmann, @lorenzo_noci. 🧵 1/6

2.0K

Enea Monzio Compagnoni @ Flexion Robotics Retweeted

Eduard Gorbunov@ed_gorbunov · Jul 7

I’m pleased to share that, starting August 1, I will be joining MBZUAI (@mbzuai) as an Assistant Professor in the Department of Statistics and Data Science. My research focuses on optimization for machine learning, with an emphasis on stochastic methods, federated learning,…

221

21.0K

Enea Monzio Compagnoni @ Flexion Robotics Retweeted

Rustem@ruuustem_10 · Jun 20

I have 6 papers in my batch as a reviewer at @NeurIPSConf . I have reviewed 4 of them so far, and 3 among them are with mistakes in the proofs… And mistakes are usually easy to spot. But at least there is one work which seems to be interesting to me which is a rare case as well

598

Enea Monzio Compagnoni @ Flexion Robotics@EneaMC · Jun 9

If you are ever ablating on LM training, this is the ONLY codebase I trust, by the amazing Nico.

T@ ·

192

Enea Monzio Compagnoni @ Flexion Robotics@EneaMC · Jun 9

Great work with tons of ablations and a nice interpretation of Adam as an online variational inference method! And super proud they used plainLM to train "over 1,300 models across different data and scales" (: github.com/Niccolo-Ajrold…

AAntonio Orvieto@orvieto_antonio · May 29

Adam is similar to many algorithms, but cannot be effectively replaced by any simpler variant in LMs. The community is starting to get the recipe right, but what is the secret sauce? @gowerrobert and I found that it has to do with the beta parameters and variational inference.…

2.0K

Enea Monzio Compagnoni @ Flexion Robotics Retweeted

Antonio Orvieto@orvieto_antonio · Jun 3

We have a new SSM theory paper, just accepted to COLT, revisiting recall properties of linear RNNs. It's surprising how much one can delve into, and how beautiful it can become. With (and only thanks to) the amazing Alexandre and @BachFrancis arxiv.org/pdf/2502.09287

172

10.0K

Enea Monzio Compagnoni @ Flexion Robotics Retweeted

Antonio Orvieto@orvieto_antonio · May 29

285

269

43.0K

Enea Monzio Compagnoni @ Flexion Robotics Retweeted

Enea Monzio Compagnoni @ Flexion Robotics@EneaMC · Jan 28

Everybody gangsta until SDEs work! #ICLR2025 Noise hits every optimizer differently! For SignSGD, adaptivity increases resistance to gradient noise, while AdamW enjoys extreme stability. Plus, AdamW has an exciting new scaling rule! More below👇! arxiv.org/abs/2411.15958

103

16.0K

Enea Monzio Compagnoni @ Flexion Robotics Retweeted

ciro iuorno@medicojunghiano · Mar 14

@orvieto_antonio complimenti per l’intervento di questo pomeriggio in Napoli che ho seguito su YouTube magari saremmo lieti di poterti ascoltare dal vivo prima o poi in PLACEBO FOUNDATION nella povera Lucania

439

Enea Monzio Compagnoni @ Flexion Robotics Retweeted

Dimitri von Rütte@dvruette · Mar 10

🚨 NEW PAPER DROP! Wouldn't it be nice if LLMs could spot and correct their own mistakes? And what if we could do so directly from pre-training, without any SFT or RL? We present a new class of discrete diffusion models, called GIDD, that are able to do just that: 🧵1/12

161

1.0K

928

138.0K

Enea Monzio Compagnoni @ Flexion Robotics Retweeted

Rustem@ruuustem_10 · Feb 18

🚀 Stronger performance, better privacy — no compromises! 📖 Check it out for more details! 🔗 arxiv.org/abs/2502.11682 Joint work with @sam_hrvth, @AurelienLucchi, @peter_richtarik, @ed_gorbunov

760