Antonio Orvieto
@orvieto_antonio
Deep Learning PI @ELLISInst_Tue, Group Leader @MPI_IS. I compute stuff with lots of gradients 🧮, I like Kierkegaard & Lévi-Strauss 🧙♂️
Come to HilD tomorrow @ICML2025 ! We have 4 posters on optimization: - In Search of Adam’s Secret Sauce - Is your batch size the problem? Revisiting the Adam-SGD gap in language modeling - On the Interaction of Noise, Compression Role, and Adaptivity under (L0,L1)-Smoothness…
A fundamental question in deep learning is how semantically similar learned functions relate in parameter space. Understanding this is essential for generalization, robustness and continual learning. @Theus__A just released new updates on this problem, specific to the…
1/ 🚨 New paper alert! 🚨 We explore a key question in deep learning: Can independently trained Transformers be linearly connected in weight space — without a loss barrier? Yes — if you uncover their rich symmetries. 📄 arXiv: arxiv.org/abs/2506.22712
Europe can lead AI research, and our plan with OpenEuroLLM is to build something amazing - not for profit, and while sharing insights at scale. We have openings for **ML Research Engineers and Scientists** to work on OpenEuroLLM at the ELLIS Institute Tübingen.…
Join our mission to strengthen AI research in Europe 🇪🇺 We are looking for several ML Research Engineers and Scientists to work on OpenEuroLLM at the ELLIS Institute Tübingen. If you're passionate about large-scale model training, multilingual evaluation and want to contribute to…
Despite theoretically handling long contexts, existing recurrent models still fall short: they may fail to generalize past the training length. We show a simple and general fix which enables length generalization in up to 256k sequences, with no need to change the architectures!
Excited to announce our recent work on low-precision deep learning via biologically-inspired noisy log-normal multiplicative dynamics (LMD). It allows us to train large neural nets (such as GPT-2 and ViT) in FP6. arxiv.org/abs/2506.17768
There's been a hole at the heart of #LLM evals, and we can now fix it. 📜New paper: Answer Matching Outperforms Multiple Choice for Language Model Evaluations. ❗️We found MCQs can be solved without even knowing the question. Looking at just the choices helps guess the answer…
Is equivariance necessary for a good 3D molecule generative model? Check out our #icml2025 paper, which closes the performance gap between non-equivariant and equivariant diffusion models via rotational alignment, while also being more efficient (1/7): arxiv.org/abs/2506.10186
Literally the best walk through optimization towards deep learning
Big thanks to the COLT 2025 organizers for an awesome event in Lyon! Here are the slides from my keynote this morning in case you’re curious about the references I mentioned: di.ens.fr/~fbach/fbach_o…