Antonio Orvieto

@orvieto_antonio

Deep Learning PI @ELLISInst_Tue, Group Leader @MPI_IS. I compute stuff with lots of gradients 🧮, I like Kierkegaard & Lévi-Strauss 🧙‍♂️

Tübingen

Joined September 2019

1KFollowing

2KFollowers

Antonio Orvieto@orvieto_antonio · Jul 17

Come to HilD tomorrow @ICML2025 ! We have 4 posters on optimization: - In Search of Adam’s Secret Sauce - Is your batch size the problem? Revisiting the Adam-SGD gap in language modeling - On the Interaction of Noise, Compression Role, and Adaptivity under (L0,L1)-Smoothness…

2.0K

Antonio Orvieto@orvieto_antonio · Jul 10

A fundamental question in deep learning is how semantically similar learned functions relate in parameter space. Understanding this is essential for generalization, robustness and continual learning. @Theus__A just released new updates on this problem, specific to the…

AAlexander Theus@Theus__A · Jul 9

1/ 🚨 New paper alert! 🚨 We explore a key question in deep learning: Can independently trained Transformers be linearly connected in weight space — without a loss barrier? Yes — if you uncover their rich symmetries. 📄 arXiv: arxiv.org/abs/2506.22712

973

Antonio Orvieto@orvieto_antonio · Jul 8

Europe can lead AI research, and our plan with OpenEuroLLM is to build something amazing - not for profit, and while sharing insights at scale. We have openings for **ML Research Engineers and Scientists** to work on OpenEuroLLM at the ELLIS Institute Tübingen.…

EELLIS Institute Tübingen@ELLISInst_Tue · Jun 24

Join our mission to strengthen AI research in Europe 🇪🇺 We are looking for several ML Research Engineers and Scientists to work on OpenEuroLLM at the ELLIS Institute Tübingen. If you're passionate about large-scale model training, multilingual evaluation and want to contribute to…

5.0K

Antonio Orvieto Retweeted

Ricardo Buitrago@rbuit_ · Jul 7

Despite theoretically handling long contexts, existing recurrent models still fall short: they may fail to generalize past the training length. We show a simple and general fix which enables length generalization in up to 256k sequences, with no need to change the architectures!

194

118

39.0K

Antonio Orvieto Retweeted

Thomas Möllenhoff@tmoellenhoff · Jul 7

Excited to announce our recent work on low-precision deep learning via biologically-inspired noisy log-normal multiplicative dynamics (LMD). It allows us to train large neural nets (such as GPT-2 and ViT) in FP6. arxiv.org/abs/2506.17768

110

13.0K

Antonio Orvieto Retweeted

Shashwat Goel@ShashwatGoel7 · Jul 4

There's been a hole at the heart of #LLM evals, and we can now fix it. 📜New paper: Answer Matching Outperforms Multiple Choice for Language Model Evaluations. ❗️We found MCQs can be solved without even knowing the question. Looking at just the choices helps guess the answer…

228

223

33.0K

Antonio Orvieto Retweeted

Yuhui Ding@yuhui_ding · Jul 2

Is equivariance necessary for a good 3D molecule generative model? Check out our #icml2025 paper, which closes the performance gap between non-equivariant and equivariant diffusion models via rotational alignment, while also being more efficient (1/7): arxiv.org/abs/2506.10186

5.0K

Antonio Orvieto@orvieto_antonio · Jul 1

Literally the best walk through optimization towards deep learning

FFrancis Bach@BachFrancis · Jul 1

Big thanks to the COLT 2025 organizers for an awesome event in Lyon! Here are the slides from my keynote this morning in case you’re curious about the references I mentioned: di.ens.fr/~fbach/fbach_o…

2.0K