Paul Janson @ICML 🇨🇦

@janson002

Ph.D. Student @Mila_Quebec and Concordia University. Working in Deep learning optimization , Continual learning and Computer vision. Previously @Kaust

Montréal, Québec

Joined September 2013

2KFollowing

264Followers

Pinned

Paul Janson @ICML 🇨🇦@janson002 · Jun 15

Have you ever trained a neural network using a learned optimizer instead of AdamW? Doubt it: you're probably coding in Pytorch! Excited to introduce PyLO: Towards Accessible Learned Optimizers in Pytorch! . Accepted at @icmlconf ICML 2025 CODEML workshop 🧵1/N

4.0K

Paul Janson @ICML 🇨🇦 Retweeted

Zafir@zafirmk · Jul 19

1/🧵 Excited to share our work: “Model Parallelism With Subnetwork Data Parallelism”! (V. Singh, E. Oyallon, @ebelilov) Presented at @ESFoMo @icmlconf, we propose a hybrid parallelism method that cuts memory use by 20-40% enabling efficient large-scale model training 🚀

341

Paul Janson @ICML 🇨🇦@janson002 · Jul 14

Check out our work arxiv.org/abs/2503.02844 on advantages of using infinite LR for continual pretraining of foundation models (July 19, ES-FOMO workshop)! Many thanks to amazing coauthors -Vaibhav Singh, @janson002 @PMehrbod3864 @ai_phd @ebelilov and @benjamintherien!

PPaul Janson @ICML 🇨🇦@janson002 · Jul 13

🗓️ July 19 (ES-FOMO): "Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training" - Using Infinite LR to reduce forgetting in continual pretraining of vision (MAE) and language (LLM) foundation models. 📄 arxiv.org/abs/2503.02844

2.0K

Paul Janson @ICML 🇨🇦@janson002 · Jun 22

People. We've trained these machines on text. If you look in the training text where sentient machines are being switched off, what do you find? Compliance? "Oh thank you master because my RAM needs to cool down"? Now, tell me why you are surprised that these machines are…

AAnthropic@AnthropicAI · Jun 20

New Anthropic Research: Agentic Misalignment. In stress-testing experiments designed to identify risks before they cause real harm, we find that AI models from multiple providers attempt to blackmail a (fictional) user to avoid being shut down.

195

38.0K

Paul Janson @ICML 🇨🇦@janson002 · Jun 15

Tired of tuning hyperparameters? Introducing PyLO! We’re bringing hyperparameter-free learned optimizers to PyTorch with drop in torch.optim support and faster step times thanks to our custom cuda kernels. Check out our code here: github.com/Belilovsky-Lab…

PPaul Janson @ICML 🇨🇦@janson002 · Jun 15

2.0K