Fabian Schaipp

@FSchaipp

working on optimization for machine learning. currently postdoc @inria_paris. sbatch and apero.

Paris, France

Joined July 2020

668Following

1KFollowers

Pinned

Fabian Schaipp@FSchaipp · Feb 5

Learning rate schedules seem mysterious? Turns out that their behaviour can be described with a bound from *convex, nonsmooth* optimization. Short thread on our latest paper 🚇 arxiv.org/abs/2501.18965

AAaron Defazio@aaron_defazio · Feb 3

The sudden loss drop when annealing the learning rate at the end of a WSD (warmup-stable-decay) schedule can be explained without relying on non-convexity or even smoothness, a new paper shows that it can be precisely predicted by theory in the convex, non-smooth setting! 1/2

132

28.0K

Pinned

Fabian Schaipp@FSchaipp · Jul 25

FYI Adam will be on holiday the entire august

YYiping Lu@2prime_PKU · Jul 25

Anyone knows adam?

458

Fabian Schaipp@FSchaipp · Jul 17

still holds in 2025

FFabian Schaipp@FSchaipp · Jul 25, 2024

#ICML2024 oral but still every speaker needs to connect his own laptop for the slides

326

Fabian Schaipp@FSchaipp · Jul 14

Best tutorials are those that do not only promote the speakers' own work. #ICML2025

3.0K

Fabian Schaipp@FSchaipp · Jul 10

🚡 Come check out our poster on understanding LR schedules at ICML. Thursday 11am.

118

15.0K

Fabian Schaipp@FSchaipp · Jul 9

Pogacar didnt have a bad day since TdF 2023, stage 17. Quite astonishing #TDF2025

425

Fabian Schaipp@FSchaipp · Jun 25

A paper that contains both the words "sigma-algebra" and "SwiGLU activations" ☑️ Also interesting results on embedding layer LRs.

FSchaipp's tweet image. A paper that contains both the words "sigma-algebra" and "SwiGLU activations" ☑️

Also interesting results on embedding layer LRs.

5.0K

Fabian Schaipp Retweeted

Mathieu Blondel@mblondel_ml · Jun 25

We uploaded V3 of our draft book "The Elements of Differentiable Programming". Lots of typo fixes, clarity improvements, new figures and a new section on Transformers! arxiv.org/abs/2403.14606

458

384

36.0K

Fabian Schaipp@FSchaipp · Jun 17

is it allowed to write papers on μP only subject to using the most un-intuitive notation?

2.0K

Fabian Schaipp@FSchaipp · May 26

what are the best ressources for training and inference setup in diffusion models? ideally with (pseudo-)code

675

Fabian Schaipp@FSchaipp · May 23

Optimization is the natural language of applied mathematics.

776

Fabian Schaipp@FSchaipp · May 1

now accepted at #ICML 2025! ☄️

FFabian Schaipp@FSchaipp · Feb 5

6.0K

Fabian Schaipp@FSchaipp · Apr 29

biggest tech improvement in a while: my (android) phone can now open arxiv pdfs in the browser without downloading them 📗

394