Benjamin Thérien

@benjamintherien

Ph.D. student at UdeM & Mila | Incoming Intern at Meta NYC | Distributed training & creating learned optimizers that generalize

Montréal, Québec

Joined November 2018

488Following

313Followers

Pinned

Benjamin Thérien@benjamintherien · May 30

Is AdamW the best inner optimizer for DiLoCo? Does the inner optimizer affect the compressibility of the DiLoCo delta? Excited to introduce MuLoCo: Muon is a practical inner optimizer for DiLoCo! 🧵arxiv.org/abs/2505.23725 1/N

benjamintherien's tweet image. Is AdamW the best inner optimizer for DiLoCo? Does the inner optimizer affect the compressibility of the DiLoCo delta? Excited to introduce MuLoCo: Muon is a practical inner optimizer for DiLoCo! 🧵arxiv.org/abs/2505.23725 1/N

7.0K

Benjamin Thérien Retweeted

Paul Janson @ICML 🇨🇦@janson002 · Jul 13

🧵 Super excited to present at two @icmlconf 2025 workshops in Vancouver 🇨🇦🍁!

193

Benjamin Thérien Retweeted

Andrei Mircea@mirandrom · Jul 12

Step 1: Understand how scaling improves LLMs. Step 2: Directly target underlying mechanism. Step 3: Improve LLMs independent of scale. Profit. In our ACL 2025 paper we look at Step 1 in terms of training dynamics. Project: mirandrom.github.io/zsl Paper: arxiv.org/pdf/2506.05447

196

142

16.0K

Benjamin Thérien Retweeted

Massimo Caccia@MassCaccia · Jul 9

🎉 Our paper “𝐻𝑜𝑤 𝑡𝑜 𝑇𝑟𝑎𝑖𝑛 𝑌𝑜𝑢𝑟 𝐿𝐿𝑀 𝑊𝑒𝑏 𝐴𝑔𝑒𝑛𝑡: 𝐴 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑎𝑙 𝐷𝑖𝑎𝑔𝑛𝑜𝑠𝑖𝑠” got an 𝐨𝐫𝐚𝐥 at next week’s 𝗜𝗖𝗠𝗟 𝗪𝗼𝗿𝗸𝘀𝗵𝗼𝗽 𝗼𝗻 𝗖𝗼𝗺𝗽𝘂𝘁𝗲𝗿 𝗨𝘀𝗲 𝗔𝗴𝗲𝗻𝘁𝘀! 🖥️🧠 We present the 𝐟𝐢𝐫𝐬𝐭 𝐥𝐚𝐫𝐠𝐞-𝐬𝐜𝐚𝐥𝐞…

204

138

30.0K

Benjamin Thérien Retweeted

Ashwinee Panda@PandaAshwinee · Jul 8

our paper on CPT of MoEs was rejected from #COLM2025 w/scores of 8775. the only reject said "I decide between 5 and 6". we emailed PCs, but just got "We are sorry, but the venue simply does not have the capacity to provide feedback at a more granular level." from @yoavartzi. 🙁

7.0K

Benjamin Thérien@benjamintherien · Jun 15

Tired of tuning hyperparameters? Introducing PyLO! We’re bringing hyperparameter-free learned optimizers to PyTorch with drop in torch.optim support and faster step times thanks to our custom cuda kernels. Check out our code here: github.com/Belilovsky-Lab…

PPaul Janson @ICML 🇨🇦@janson002 · Jun 15

Have you ever trained a neural network using a learned optimizer instead of AdamW? Doubt it: you're probably coding in Pytorch! Excited to introduce PyLO: Towards Accessible Learned Optimizers in Pytorch! . Accepted at @icmlconf ICML 2025 CODEML workshop 🧵1/N

2.0K

Benjamin Thérien Retweeted

Luke Rowe@Luke22R · Jun 12

🚀 Our method, Poutine, was the best-performing entry in the 2025 Waymo Vision-based End-to-End Driving Challenge at #CVPR2025! Our 3 B-parameter VLM Poutine scored 7.99 RFS on the official test set—comfortably ahead of every other entry (see figure).

3.0K

Benjamin Thérien Retweeted

Emiliano Penaloza@emilianopp_ · Jun 11

Excited that our paper "Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization" was accepted to ICML 2025! We show how Preference Optimization can reduce the impact of noisy concept labels in CBMs. 🧵/9

2.0K

Benjamin Thérien Retweeted

Majdi Hassan@majdi_has · Jun 10

(1/n)🚨You can train a model solving DFT for any geometry almost without training data!🚨 Introducing Self-Refining Training for Amortized Density Functional Theory — a variational framework for learning a DFT solver that predicts the ground-state solutions for different…

155

16.0K

Benjamin Thérien Retweeted

Quentin Anthony@QuentinAnthon15 · Jun 4

Inspired by “minimal implementation“ projects in AI such as @karpathy’s nanoGPT, I worked to bring this concept to the HPC world! I’ve built a minimal implementation of an MPI library called nanoMPI, which focuses on clarity, simplicity, and easy installation.

311

192

38.0K