Volkan Cevher
@CevherLIONS
Associate Professor of Electrical Engineering, EPFL. Amazon Scholar (AGI Foundations). IEEE Fellow. ELLIS Fellow.
@caglarml and I are excited to share our lecture slides for EE-628 Training Large Language Models course: epfl.ch/labs/lions/tea… If you have any feedback, please reach out to us. I am also at #ICLR25.
1/n If you are developing a new IL algorithm that alternates between reward and SAC updates, read this new trick named SOAR ! arxiv.org/abs/2502.19859 It has guarantees in the tabular environments and halves the training time in MuJoCo ;) ICML work with Stefano and @CevherLIONS
Actually its’s even older! Spectral stochastic gradient descent from 2015!
I will give the presentation today 4pm at #ICML2025 Oral session: Learning dynamics 2 @ West Ballroom B! Here is the poster and long-version slides (lfhsgre.org/files/talk_LoR…) if you’re interested in.
(1/n) 🚀Thrill to share our LoRA-One work (arxiv.org/abs/2502.01235) as #ICML25 𝐨𝐫𝐚𝐥 𝐩𝐫𝐞𝐬𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧, w. Fanghui @Fanghui_SgrA (Warwick) and Yudong (Madison). Oral @ West Ballroom B, 4pm at July 17th Poster @ West Exhibition Hall B2-B3 #W 905, 4:30PM at July 15th
Excited to give a tutorial with @leenaCvankadara on Training Neural Networks at Any Scale (TRAINS) @icmlconf at 13:30 (West Ballroom A). Our slides can be found here: go.epfl.ch/ICML25TRAINS Please join us.
Join our ML Theory group next week as they welcome @tonysilveti on July 3rd for a presentation on "Training neural networks at any scale" Thanks to @itsmaddox_j @aniervs and @ThangChu77 for organizing this session 👏 Learn more: cohere.com/events/Cohere-…
🚨 Panel on "how are theoretical tools useful in vision?" with an amazing list of panelists: @CevherLIONS @orussakovsky @vidal_rene Open to your questions, the more ambitious the better. In @CVPR : Room 107 A at 12 🎸.
We purposely made it great at optimal transport as you may have guessed !
Just tested this model on a few challenging math questions and I found it very helpful. Magistral keeps doubting its answers ("wait, but...") & trying to improve them, which makes it great at exploring & exploiting knowledge from its train data (and it's fast). Congrats Mistral !
If you cite Muon, I think you should definitely cite SSD (proceedings.mlr.press/v38/carlson15.…) by @CevherLIONS et al. (sorry I can't find the handle of other authors) -- which proposed spectral descent.
Finally, we have expert sample complexity bounds in multi agent imitation learning! arxiv.org/pdf/2505.17610 Joint work with @TFreihaut, @CevherLIONS, Matthieu and @gio_ramponi
The Sun photographed for more than a year from the same spot at the same time ♾
A short and sweet proof of convergence of steepest descent w.r.t. an arbitrary norm in the nonconvex (but smooth) setting.
I have an opening for a post-doc position: I am looking for smart people with a strong CV in optimization and/or online learning All my ex post-docs (@kwangsungjun, @mingruiliuCS, and @emsaad_p) became assistant professors, I'd like to continue this trend 😉 Please share it!
This will be presented at ICML !
We have a new results on arxiv: arxiv.org/abs/2502.11673 :)
@CevherLIONS and I have spent a lot of time preparing these course materials on the "foundations of training LLMs." Now, we are excited to share them with the broader community. These lectures touch both on theoretical (like MuP) and empirical aspects of training LLMs.
@caglarml and I are excited to share our lecture slides for EE-628 Training Large Language Models course: epfl.ch/labs/lions/tea… If you have any feedback, please reach out to us. I am also at #ICLR25.
We'll be presenting our work at #ICLR2025 today on "Efficient Interpolation between Extragradient and Proximal Methods for Weak MVIs" which is join work with @CevherLIONS and Ioannis Mavrothalassitis. Stop by if you're interested in games and nonconvexity!
Caltech's "Probability in High Dimensions" by Prof. Joel A. Tropp PDF: tropp.caltech.edu/notes/Tro21-Pr…
FYI if you suspect that a reviewer used an LLM to generate an ICML review, you can report them at this link: docs.google.com/forms/d/e/1FAI…
Part of the confusion comes from the fact that on twitter Muon is used to refer to several, distinct algorithms (which continue to evolve). Original Muon does not have HP transfer but if you're happy to call anything using newton-schulz Muon then you can claim a lot of things.