Shikai Qiu
@ShikaiQiu
ML PhD student, Scaling | Prev SR @GoogleDeepMind, Physics @UCBerkeley
While scaling laws typically predict the final loss, we show in our ICML oral paper that good scaling rules enable accurate predictions of entire loss curves of larger models from smaller ones! w/@Locchiu, @andrewgwils, J. Pennington, A. Agarwala: arxiv.org/abs/2507.02119 1/10

In our new ICML paper, we show that popular families of OOD detection procedures, such as feature and logit based methods, are fundamentally misspecified, answering a different question than “is this point from a different distribution?” arxiv.org/abs/2507.01831 [1/7]
Why do larger language models generalize better? In our new ICLR paper, we derive an interpretable generalization bound showing that compute-optimal LLMs provably generalize better with scale! 📄arxiv.org/abs/2504.15208 1/7🧵