Volkan Cevher

@CevherLIONS

Associate Professor of Electrical Engineering, EPFL. Amazon Scholar (AGI Foundations). IEEE Fellow. ELLIS Fellow.

Lausanne, Switzerland

Joined January 2013

653Following

3KFollowers

Pinned

Volkan Cevher@CevherLIONS · Apr 25

@caglarml and I are excited to share our lecture slides for EE-628 Training Large Language Models course: epfl.ch/labs/lions/tea… If you have any feedback, please reach out to us. I am also at #ICLR25.

CevherLIONS's tweet card. Outline The 2025 course consists of the following topics: Lecture 1 – Architectures Lecture 2 – Optimization and Hyperparameter Transfer Lecture 3 – Data Mixtures Lecture 4 – Fine Tuning Lecture 5...

7.0K

Pinned

Volkan Cevher Retweeted

Luca Viano@LucaViano4 · May 2

1/n If you are developing a new IL algorithm that alternates between reward and SAC updates, read this new trick named SOAR ! arxiv.org/abs/2502.19859 It has guarantees in the tabular environments and halves the training time in MuJoCo ;) ICML work with Stefano and @CevherLIONS

2.0K

Volkan Cevher Retweeted

rohan anil@_arohan_ · Jul 13

Actually its’s even older! Spectral stochastic gradient descent from 2015!

3.0K

Volkan Cevher@CevherLIONS · Jul 17

I will give the presentation today 4pm at #ICML2025 Oral session: Learning dynamics 2 @ West Ballroom B! Here is the poster and long-version slides (lfhsgre.org/files/talk_LoR…) if you’re interested in.

YYuanhe Zhang@yuanhezhang6 · Jul 12

(1/n) 🚀Thrill to share our LoRA-One work (arxiv.org/abs/2502.01235) as #ICML25 𝐨𝐫𝐚𝐥 𝐩𝐫𝐞𝐬𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧, w. Fanghui @Fanghui_SgrA (Warwick) and Yudong (Madison). Oral @ West Ballroom B, 4pm at July 17th Poster @ West Exhibition Hall B2-B3 #W 905, 4:30PM at July 15th

1.0K

Volkan Cevher@CevherLIONS · Jul 14

Excited to give a tutorial with @leenaCvankadara on Training Neural Networks at Any Scale (TRAINS) @icmlconf at 13:30 (West Ballroom A). Our slides can be found here: go.epfl.ch/ICML25TRAINS Please join us.

101.0K

Volkan Cevher Retweeted

Cohere Labs@Cohere_Labs · Jun 26

Join our ML Theory group next week as they welcome @tonysilveti on July 3rd for a presentation on "Training neural networks at any scale" Thanks to @itsmaddox_j @aniervs and @ThangChu77 for organizing this session 👏 Learn more: cohere.com/events/Cohere-…

7.0K

Volkan Cevher Retweeted

Grigoris Chrysos@Grigoris_c · Jun 12

🚨 Panel on "how are theoretical tools useful in vision?" with an amazing list of panelists: @CevherLIONS @orussakovsky @vidal_rene Open to your questions, the more ambitious the better. In @CVPR : Room 107 A at 12 🎸.

1.0K

Volkan Cevher@CevherLIONS · Jun 11

We purposely made it great at optimal transport as you may have guessed !

LLénaïc Chizat@LenaicChizat · Jun 11

Just tested this model on a few challenging math questions and I found it very helpful. Magistral keeps doubting its answers ("wait, but...") & trying to improve them, which makes it great at exploring & exploiting knowledge from its train data (and it's fast). Congrats Mistral !

125

14.0K

Volkan Cevher Retweeted

You Jiacheng@YouJiacheng · Jun 11

If you cite Muon, I think you should definitely cite SSD (proceedings.mlr.press/v38/carlson15.…) by @CevherLIONS et al. (sorry I can't find the handle of other authors) -- which proposed spectral descent.

155

9.0K

Volkan Cevher Retweeted

Luca Viano@LucaViano4 · May 26

Finally, we have expert sample complexity bounds in multi agent imitation learning! arxiv.org/pdf/2505.17610 Joint work with @TFreihaut, @CevherLIONS, Matthieu and @gio_ramponi

4.0K

Volkan Cevher Retweeted

Curiosity@MAstronomers · May 8

The Sun photographed for more than a year from the same spot at the same time ♾

388

3.0K

27.0K

5.0K

3.4M

Volkan Cevher Retweeted

Tony S.F.@tonysilveti · May 8

A short and sweet proof of convergence of steepest descent w.r.t. an arbitrary norm in the nonconvex (but smooth) setting.

2.0K

Volkan Cevher Retweeted

Francesco Orabona@bremen79 · May 8

I have an opening for a post-doc position: I am looking for smart people with a strong CV in optimization and/or online learning All my ex post-docs (@kwangsungjun, @mingruiliuCS, and @emsaad_p) became assistant professors, I'd like to continue this trend 😉 Please share it!

113

20.0K

Volkan Cevher@CevherLIONS · May 1

This will be presented at ICML !

LLuca Viano@LucaViano4 · Feb 24

We have a new results on arxiv: arxiv.org/abs/2502.11673 :)

1.0K

Volkan Cevher@CevherLIONS · Apr 25

@CevherLIONS and I have spent a lot of time preparing these course materials on the "foundations of training LLMs." Now, we are excited to share them with the broader community. These lectures touch both on theoretical (like MuP) and empirical aspects of training LLMs.

VVolkan Cevher@CevherLIONS · Apr 25

2.0K

Volkan Cevher Retweeted

Thomas Pethick@tmpethick · Apr 25

We'll be presenting our work at #ICLR2025 today on "Efficient Interpolation between Extragradient and Proximal Methods for Weak MVIs" which is join work with @CevherLIONS and Ioannis Mavrothalassitis. Stop by if you're interested in games and nonconvexity!

937

Volkan Cevher Retweeted

Math Cafe@Riazi_Cafe_en · Apr 7

Caltech's "Probability in High Dimensions" by Prof. Joel A. Tropp PDF: tropp.caltech.edu/notes/Tro21-Pr…

273

184

11.0K

Volkan Cevher Retweeted

Francesco Orabona@bremen79 · Mar 25

FYI if you suspect that a reviewer used an LLM to generate an ICML review, you can report them at this link: docs.google.com/forms/d/e/1FAI…

8.0K

Volkan Cevher Retweeted

Tony S.F.@tonysilveti · Mar 9

Part of the confusion comes from the fact that on twitter Muon is used to refer to several, distinct algorithms (which continue to evolve). Original Muon does not have HP transfer but if you're happy to call anything using newton-schulz Muon then you can claim a lot of things.

1.0K