Alvaro Arroyo
@arroyo_alvr
PhD ML @UniofOxford ; Transformers & Graph Representation Learning; Previously at @imperialcollege
Vanishing gradients are central to RNNs and SSMs, but how do they affect GNNs? We explore this in our new paper! w/ A. Gravina, @benpgutteridge @fedzbar C. Gallicchio @epomqo @mmbronstein @trekkinglemon 🔗 arxiv.org/abs/2502.10818 🧵(1/11)
🚨 ICML 2025 Paper 🚨 "On Measuring Long-Range Interactions in Graph Neural Networks" We formalize the long-range problem in GNNs: 💡Derive a principled range measure 🔧 Tools to assess models & benchmarks 🔬Critically assess LRGB 🧵 Thread below 👇 #ICML2025
Ever wondered how linear RNNs like #mLSTM (#xLSTM) or #Mamba can be extended to multiple dimensions? Check out "pLSTM: parallelizable Linear Source Transition Mark networks". #pLSTM works on sequences, images, (directed acyclic) graphs. Paper link: arxiv.org/abs/2506.11997
📢ChebNet is back—with long-range abilities on graphs !🎉 We revive ChebNet for long-range tasks, uncover instability in polynomial filters, and propose Stable‑ChebNet—a non-dissipative dynamical system with controlled, stable info propagation 🚀 📄: arxiv.org/abs/2506.07624
How we "guessed" the Pope using network science: inside the cardinal network. A study by me, Beppe Soda and Alessandro Iorio. Article: unibocconi.it/en/news/networ… @Unibocconi
A Bayesian’s take on filtering without Bayes. Part III: The Kalman filter. In this post, we walk through the derivation of the Kalman filter without priors or posteriors and explore its application to time-series forecasting and online learning. grdm.io/posts/filterin…
i started out by studying graph attention networks and now i'm... basically studying graph attention networks again?! 😅
LLMs anchor themselves on the first token to dampen and stabilize the interactions on the other tokens. A great explanation of attention sinks with minimal math, and great diagrams!
Why do LLMs attend to the first token? This new paper explains why LLMs obsessively focus attention on the first token — a phenomenon known as an attention sink. Their theory: it’s a useful trick to prevent representational collapse in deep Transformers. • Sinks = over-mixing…
Fresh out of the oven 🥖 🍞 — stay tuned 👀 When someone beats you to your own paper announcement lol
Why do LLMs attend to the first token? This new paper explains why LLMs obsessively focus attention on the first token — a phenomenon known as an attention sink. Their theory: it’s a useful trick to prevent representational collapse in deep Transformers. • Sinks = over-mixing…
New Anthropic research: Tracing the thoughts of a large language model. We built a "microscope" to inspect what happens inside AI models and use it to understand Claude’s (often complex and surprising) internal mechanisms.
New open source reasoning model! Huginn-3.5B reasons implicitly in latent space 🧠 Unlike O1 and R1, latent reasoning doesn’t need special chain-of-thought training data, and doesn't produce extra CoT tokens at test time. We trained on 800B tokens 👇
A great interview of @fedzbar by @ecsquendor (for @MLStreetTalk), discussing our NeurIPS'24 paper. Check it out to learn more about why Transformers need Glasses! 👓 youtube.com/watch?v=FAspMn…
New preprint! 🚨 We scale equilibrium sampling to hexapeptide (in cartesian coordinates!) with Sequential Boltzmann generators! 📈 🤯 Work with @bose_joey, @WillLin1028, @leonklein26, @mmbronstein and @AlexanderTong7 Thread 🧵 1/11