Francis Bach
@BachFrancis
Researcher in machine learning
For good probability predictions, you should use post-hoc calibration. With @Eugene_Berta, Michael Jordan, and @BachFrancis we argue that early stopping and tuning should account for this! Using the loss after post-hoc calibration often avoids premature stopping. 🧵1/
Tired of lengthy computations to derive scaling laws? This post is made for you: discover the sharpness of the z-transform! francisbach.com/z-transform/
What if AI isn’t about building solo geniuses, but designing social systems? Michael Jordan advocates blending ML, economics, and uncertainty management to prioritize social welfare over mere prediction. A must-read rethink. arxiv.org/abs/2507.06268…
Happy to have our recent papers on conformal prediction with e-values presented at COLT by my advisor @BachFrancis! Full details here: 📚arxiv.org/abs/2503.13050 📚arxiv.org/abs/2505.13732 #COLT2025
Big thanks to the COLT 2025 organizers for an awesome event in Lyon! Here are the slides from my keynote this morning in case you’re curious about the references I mentioned: di.ens.fr/~fbach/fbach_o…
Take a break from the heat and check it out!
I’ll be presenting our paper at COLT in Lyon this Monday at the Predictions and Uncertainty workshop — come say hi if you're around! 👋 Check out @DHolzmueller's thread below 👇 #COLT2025
Congrats Fred!
My former PhD student Fred Kunstner has been awarded the @c_a_i_a_c Best Doctoral Dissertation Award: cs.ubc.ca/news/2025/06/f… His thesis on machine learning algorithms includes an EM proof "from the book", why Adam works, and the first provably-faster hyper-gradient method.
We have a new SSM theory paper, just accepted to COLT, revisiting recall properties of linear RNNs. It's surprising how much one can delve into, and how beautiful it can become. With (and only thanks to) the amazing Alexandre and @BachFrancis arxiv.org/pdf/2502.09287
Please RT: The PAISS summer school is back with an amazing line of speakers (and more to come). Spread the word !
Announcing : The 2nd International Summer School on Mathematical Aspects of Data Science EPFL, Sept 1–5, 2025 Speakers: Bach (@BachFrancis) Bandeira Mallat Montanari (@Andrea__M) Peyré (@gabrielpeyre) For PhD students & early-career researchers Application deadline: May 15
📣 Vous souhaitez tout comprendre (ou presque) sur l'#IA ? Le livre "Tout comprendre (ou presque) sur l'intelligence artificielle" d'Olivier Cappé et Claire Marc est disponible en ligne dès à présent et en physique à partir du jeudi 10 avril ➡️ cnrseditions.fr/catalogue/soci… 🤝 @CNRSEd
A free book: Learning Theory from First Principles by @BachFrancis It covers a bunch of key topics from machine learning (ML) theory and practice, such as: - Math basics - Supervised learning - Generalization, overfitting & adaptivity - Tools to design learning algorithms -…
Characterizing finely the decay of eigenvalues of kernel matrices: many people need it, but explicit references are hard to find. This blog post reviews amazing asymptotic results from Harold Widom (1963!) and proposes new non-asymptotic bounds. francisbach.com/spectrum-kerne…

🚨 New paper on regression and classification! Adding to the discussion on using least-squares or cross-entropy, regression or classification formulations of supervised problems! A thread on how to bridge these problems:
Michael Jordan is indeed one of the greatest thinkers in the history of AI 🐐 Economics, incentives (mechanism design), information flow, creativity, cooperation, greed and power struggles are important topics that we crucially need to understand better for the benefit of…
An inspirational talk by Michael Jordan: a refreshing, deep, and forward-looking vision for AI beyond LLMs. youtube.com/live/W0QLq4qEm…
An inspirational talk by Michael Jordan: a refreshing, deep, and forward-looking vision for AI beyond LLMs. youtube.com/live/W0QLq4qEm…

Nice picture. Probably a book worth buying…
[🔴#IAScienceSociety] @BachFrancis (@inria_paris-@ENS_ParisSaclay) participe au symposium "The Mathematics of #MachineLearning" aux côtés de @gabrielpeyre, Gilles Louppe (@UniversiteLiege), Gersende Fort (@CNRS) et Vianney Perchet (@INSAE @IP_Paris_).
Learning rate schedules seem mysterious? Turns out that their behaviour can be described with a bound from *convex, nonsmooth* optimization. Short thread on our latest paper 🚇 arxiv.org/abs/2501.18965
The sudden loss drop when annealing the learning rate at the end of a WSD (warmup-stable-decay) schedule can be explained without relying on non-convexity or even smoothness, a new paper shows that it can be precisely predicted by theory in the convex, non-smooth setting! 1/2