Stefan Horoi
@stefanhoroi
PhD student at @UMontreal and @Mila_Quebec, currently working on model merging and representation comparison.
🔎Do better expert models always lead to better model merging & MoErging? And how does expert training (duration) affect model upcycling? We tackle these questions in our recent work: “Less is More: Undertraining Experts Improves Model Upcycling” 🧵1/N
How do MoE transformers, like DeepSeek, behave under distribution shifts? Do their routers collapse? Can they still match full re-training performance? Excited to present “Continual Pre-training of MoEs: How robust is your router?”!🧵arxiv.org/abs/2503.05029 1/N
Very excited to present our paper "Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis" at @icmlconf 2024! Come see our poster tomorrow, Wed. July 24th 1h30-3pm Paper: openreview.net/forum?id=hLuNV… Code: github.com/shoroi/align-n… @Mila_Quebec #ICML2024
Mes remerciements les plus sincères à la Fondation Schulich, à M. Seymour Schulich et à l'Université de Montréal! #2017SLSquad
