Margaret Li
@margs_li
π©βπ» PhD student @UWCSE / @UWNLP & @MetaAI. Formerly RE @FacebookAI Research, @Penn CS | πππ§π₯― certified bi-coastal bb IAH/PEK/PHL/NYC/SFO/SEA
We nearly drove ourselves insane trying to reproduce scaling laws papers π So of course we wrote a paper about itΒ π΅βπ« 1/9

I can't stop thinking about jart's optimization (ty for teaching!): the idea of keeping weights away in swap when not needed + this paper train many LMs, each specialized on 1 cluster of the corpa. you can get away with using 2 or 4 for inference. arxiv.org/abs/2303.14177
.@colinraffel, @margs_li, @SamuelAinsworth, and I are proposing a workshop on Collaborative, Communal, and Continual Machine Learning at NeurIPS 2023! If you'd like to be a reviewer for our workshop, please sign up here: forms.gle/QDinJ6xWviAkj1β¦
New paper alert!! β¨ Translate to Disambiguate: Zero-shot Multilingual Word Sense Disambiguation with Pretrained Language Models (PLMs) β¨ We evaluate how well PLMs translate words in context and then leverage this prompting setup to perform zero-shot WSD on 18 languages! 1/n
Sharing our project on 1) accelerating and 2) stabilizing training for large language-vision models 1) Towards accelerating training, we introduce SwitchBack, a linear layer for int8 quantized training which matches bfloat16 within 0.1 for CLIP ViT-Huge arxiv.org/abs/2304.13013