Paul Janson @ICML 🇨🇦
@janson002
Ph.D. Student @Mila_Quebec and Concordia University. Working in Deep learning optimization , Continual learning and Computer vision. Previously @Kaust
Have you ever trained a neural network using a learned optimizer instead of AdamW? Doubt it: you're probably coding in Pytorch! Excited to introduce PyLO: Towards Accessible Learned Optimizers in Pytorch! . Accepted at @icmlconf ICML 2025 CODEML workshop 🧵1/N
1/🧵 Excited to share our work: “Model Parallelism With Subnetwork Data Parallelism”! (V. Singh, E. Oyallon, @ebelilov) Presented at @ESFoMo @icmlconf, we propose a hybrid parallelism method that cuts memory use by 20-40% enabling efficient large-scale model training 🚀
Check out our work arxiv.org/abs/2503.02844 on advantages of using infinite LR for continual pretraining of foundation models (July 19, ES-FOMO workshop)! Many thanks to amazing coauthors -Vaibhav Singh, @janson002 @PMehrbod3864 @ai_phd @ebelilov and @benjamintherien!
🗓️ July 19 (ES-FOMO): "Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training" - Using Infinite LR to reduce forgetting in continual pretraining of vision (MAE) and language (LLM) foundation models. 📄 arxiv.org/abs/2503.02844
People. We've trained these machines on text. If you look in the training text where sentient machines are being switched off, what do you find? Compliance? "Oh thank you master because my RAM needs to cool down"? Now, tell me why you are surprised that these machines are…
New Anthropic Research: Agentic Misalignment. In stress-testing experiments designed to identify risks before they cause real harm, we find that AI models from multiple providers attempt to blackmail a (fictional) user to avoid being shut down.
Tired of tuning hyperparameters? Introducing PyLO! We’re bringing hyperparameter-free learned optimizers to PyTorch with drop in torch.optim support and faster step times thanks to our custom cuda kernels. Check out our code here: github.com/Belilovsky-Lab…
Have you ever trained a neural network using a learned optimizer instead of AdamW? Doubt it: you're probably coding in Pytorch! Excited to introduce PyLO: Towards Accessible Learned Optimizers in Pytorch! . Accepted at @icmlconf ICML 2025 CODEML workshop 🧵1/N