Michael Eli Sander
@m_e_sander
Research Scientist at Google DeepMind
🚨🚨New ICML 2024 Paper: arxiv.org/abs/2402.05787 How do Transformers perform In-Context Autoregressive Learning? We investigate how causal Transformers learn simple autoregressive processes or order 1. with @RGiryes, @btreetaiji, @mblondel_ml and @gabrielpeyre 🙏

🚨New paper alert🚨: arxiv.org/abs/2410.01537 How does Transformer retrieve information which is sparsely concentrated in few tokens? e.g., the label can change by flipping a single word. To explain this, we introduce a new statistical task, and show that attention solves it ⬇️
Distillation is becoming a major paradigm for training LLMs but its success and failure modes remain quite mysterious. Our paper introduces the phenomenon of "teacher hacking" and studies how to mitigate it. arxiv.org/abs/2502.02671 More details in the thread below.
1/ If you’re familiar with RLHF, you likely heard of reward hacking —where over-optimizing the imperfect reward model leads to unintended behaviors. But what about teacher hacking in knowledge distillation: can the teacher be hacked, like rewards in RLHF?
Really proud of these two companion papers by our team at GDM: 1) Joint Learning of Energy-based Models and their Partition Function arxiv.org/abs/2501.18528 2) Loss Functions and Operators Generated by f-Divergences arxiv.org/abs/2501.18537 A thread.
I am in NeurIPS week :) Friday, Presenting our spotlight work: Watermarking Makes LLMs Radioactive ☢️ (arxiv.org/abs/2402.14904) Sunday, speaking at the image watermarking workshop about our latest Watermark Anything work (arxiv.org/abs/2411.07231) DM me if you’d like to chat :)
♟️Mastering Board Games by External and Internal Planning with Language Models♟️ I'm happy to finally share storage.googleapis.com/deepmind-media… TL;DR: In chess, our planning agents effectively reach grandmaster-level strength with a comparable search budget to that of human players!
I'm excited to share a new paper: "Mastering Board Games by External and Internal Planning with Language Models" storage.googleapis.com/deepmind-media… (also soon to be up on Arxiv, once it's been processed there)
Merci pour l’opportunité d’avoir échangé sur mes recherches et mes expériences ! Merci à mes directeurs de thèse @gabrielpeyre et @RemiGribonval pour votre supervision 😊
📽️On a interviewé @SibylleMarcotte , doctorante @ENS_ULM, membre de l'équipe Ockham, lauréate 🏆du prix Jeunes Talents France 2024 L'Oréal - @UNESCO #ForWomenInScience ▶️ses recherches et ses conseils pour les filles souhaitant devenir #scientifiques :) @UnivLyon1 @ENSdeLyon
☢️ Some news about radioactivity ☢️ - We got a Spotlight at Neurips! 🥳 and we will be in Vancouver with @pierrefdz to present! - We have just released our code for radioactivity detection at github.com/facebookresear….
OpenAI may secretly know that you trained on GPT outputs! In our work "Watermarking Makes Language Models Radioactive", we show that training on watermarked text can be easily spotted ☢️ Paper: arxiv.org/abs/2402.14904 @pierrefdz @AIatMeta @Polytechnique @Inria
🔒Image watermarking is promising for digital content protection. But images often undergo many modifications—spliced or altered by AI. Today at @AIatMeta, we released Watermark Anything that answers not only "where does the image come from," but "what part comes from where." 🧵
Six years at Google today! 🎉 From 🇨🇦 to 🇨🇭, optimizing everything in sight. Grateful for the incredible journey and amazing colleagues!
🏆Didn't get the Physics Nobel Prize this year, but really excited to share that I've been named one of the #FWIS2024 @FondationLOreal-@UNESCO French Young Talents alongside 34 amazing young researchers! This award recognizes my research on deep learning theory #WomenInScience 👩💻
#FWIS2024 🎖️@SibylleMarcotte, doctorante au département #mathématiques et applications de l'ENS @psl_univ, figure parmi les lauréates du Prix Jeunes Talents France 2024 @FondationLOreal @UNESCO #ForWomenInScience @AcadSciences @4womeninscience Félicitations à elle !!! 👏
🥳🥳 Thrilled to share that I've joined Google DeepMind as a Research Scientist. Super excited for what's to come!

After a very constructive back and forth with editors and reviewers of @NatureComms, scConfluence has now been published @LauCan88 @gabrielpeyre ! I'll present it this afternoon at the poster session of @ECCBinfo (P296) Published version: nature.com/articles/s4146…
🥳 I’m very happy to announce our preprint biorxiv.org/content/10.110… ! scConfluence combines uncoupled autoencoders with Inverse Optimal Transport to integrate unpaired multimodal single-cell data in shared low dimensional latent space. @LauCan88 @gabrielpeyre
"Transformers are Universal In-context Learners": in this paper, we show that deep transformers with a fixed embedding dimension are universal approximators for an arbitrarily large number of tokens. arxiv.org/abs/2408.01367
🎉 New preprint! biorxiv.org/content/10.110… STORIES learns a differentiation potential from spatial transcriptomics profiled at several time points using Fused Gromov-Wasserstein, an extension of Optimal Transport. @gabrielpeyre @LauCan88
🚨🚨 AI in Bio release 🧬 Very happy to share my work on a Large Cell Model for Gene Network Inference. It is for now just a preprint and more is to come. We are asking the question: “What can 50M cells tell us about gene networks?” ❓Behind it, other questions arose like:…
We uploaded a v2 of our book draft "The Elements of Differentiable Programming" with many improvements (~70 pages of new content) and a new chapter on differentiable data structures (lists and dictionaries). arxiv.org/abs/2403.14606
Come and see us today at 1:30 pm at spot #411 for our poster session !!
🚨🚨New ICML 2024 Paper: arxiv.org/abs/2402.05787 How do Transformers perform In-Context Autoregressive Learning? We investigate how causal Transformers learn simple autoregressive processes or order 1. with @RGiryes, @btreetaiji, @mblondel_ml and @gabrielpeyre 🙏
You didn’t believe in Differential Private training for foundation models? We achieved the same performance as non-private MAE trained on the same dataset, but with rigorous DP. Code is released: github.com/facebookresear…. Presenting tomorrow at ICML, 11:30AM poster, #2313