Kairong Luo
@openhonor
PhD Student @ Tsinghua University | Researching LLM
๐ข Come meet us at #ICLR2025! We'll be presenting our Multi-Power Law โ a new approach to predicting full pretraining loss curves across LR schedules โ during the poster session: ๐ Friday, April 25 ๐ 3:00 PM โ 5:30 PM CST ๐ Hall 3 + Hall 2B, Poster #237 Expect your feedback!
๐How does pretraining loss evolve under different LR schedules? ๐Meet our Multi-Power Law: predicts the full loss curve for various schedules! ๐Accurate enough to optimize LR schedules directly. ๐Result? A WSD-like schedule that outperforms the rest! ๐ฅAccepted at #ICLR2025
๐ข Come meet us at #ICLR2025! We'll be presenting our Multi-Power Law โ a new approach to predicting full pretraining loss curves across LR schedules โ during the poster session: ๐ Friday, April 25 ๐ 3:00 PM โ 5:30 PM CST ๐ Hall 3 + Hall 2B, Poster #237 Expect your feedback!
