Felix Sarnthein
@__safelix__
PhD student in machine learning at @ELLISInst_Tue, @MPI_IS and @CSatETH with @orvieto_antonio. Prev: MSc in CS at @ETH
AlgoPerf leaderboards are out! 🎉 Amazing third place with and thanks to @orvieto_antonio, @jonasgeiping, @ELLISInst_Tue! 1/n
@MLCommons #AlgoPerf results are in! 🏁 $50K prize competition yielded 28% faster neural net training with non-diagonal preconditioning beating Nesterov Adam. New SOTA for hyperparameter-free algorithms too! Full details in our blog. mlcommons.org/2024/08/mlc-al… #AIOptimization #AI
🚨 NEW PAPER DROP! Wouldn't it be nice if LLMs could spot and correct their own mistakes? And what if we could do so directly from pre-training, without any SFT or RL? We present a new class of discrete diffusion models, called GIDD, that are able to do just that: 🧵1/12
LLMs can now track states, finally matching this cat! And we prove it. But how? 🧵👇 1/ Paper: arxiv.org/abs/2411.12537 with @julien_siems @jkhfranke @ZelaArber @FrankRHutter @MPontil
If two models are more similar to each other than a third on ImageNet, will this hold for medical/ satellite images? Our preprint analyzes how vision model similarities generalize across datasets, the factors that influence them, and their link to downstream task behavior. 🧵1/7
The new call for Principal Investigators at the ELLIS Institute Tübingen is out! 🚀 We are recruiting Principal Investigators as Hector Endowed Fellows of the ELLIS Institute Tübingen in the areas of Machine Learning, Artificial Intelligence, and related fields. The positions…
Join us today at 13.30 in #ICML to learn how to navigate across scaling laws and how to accelerate your training! Poster #1007
Scaling laws predict the minimum required amount of compute to reach a given performance, but can we do better? Yes, if we allow for a flexible "shape" of the model! 🤸
Our Next Generation Sequence Modeling Architectures workshop proposal was accepted by ICML! We have an incredible lineup of speakers, please come say hi and consider submitting your works! :)
Feeling very fortunate to co-organize this workshop with an incredible group of researchers, Razvan Pascanu, @orvieto_antonio, Carmen Amo Alonso, and Maciej Wołczyk!
🎙 The first episode of the @Cyber_Valley Podcast with our Principal Investigators is now out! 🚀 @Orvieto_Antonio #AIPodcast #AIResearch #AI 🔗 Learn more: institute-tue.ellis.eu/en/news/cyber-…
Monograph on "Formal Aspects of Language Modeling" from @ryandcotterell et al. arxiv.org/abs/2311.04329 It would be so nice if everyone read this and we had shared foundations. Particularly for interpretability.
Why in neural networks the learning rate can transfer from small to large models (both in width and depth)? It turns out that the sharpness dynamics can explain it. Check out our new work! arxiv.org/abs/2402.17457 w/ @alexmeterez (co-first), @orvieto_antonio and T. Hofmann
If you are looking for a PhD position in the intersection between Deep Learning and Optimization, it's not too late to apply to my group at @MPI_IS and @ELLISforEurope Institute Tübingen! Send a DM if you are interested :) institute-tue.ellis.eu/research-group…
I’ll be presenting "Scaling MLPs" at #NeurIPS2023, tomorrow (Wed) at 10:45am! Hyped to discuss things like inductive bias, the bitter lesson, compute-optimality and scaling laws 👷⚖️📈