David Hall
@dlwh
Research Engineering Lead at @StanfordCRFM. Previously co-founder at Semantic Machines ⟶ MSFT. Lead developer of Levanter and Marin @[email protected]
.@StanfordCRFM's Marin project has released the first fully open model in JAX. It’s an 'open lab' sharing the entire research process - including code, data, and logs, to enable reproducibility and further innovation. developers.googleblog.com/en/stanfords-m…
Super excited to share SmolLM3, a new strong 3B model. SmolLM3 is fully open, we share the recipe, the dataset, the training codebase and much more! > Train on 11T token on 384 H100 for 220k GPU hours > Support long context up to 128k thanks to NoPE and intra document masking >…
Nice quick reproduction from @WilliamBarrHeld in Marin Speedrun! AdamC seems like an easy win
While doing WSD cooldowns for the marin.community project, this gradient increase led to problematic loss ascent. We patched it with Z-loss, but AdamC feels better™️. So over the weekend, I ran 4 experiments—130M to 1.4B params—all at ~compute-optimal token counts...🧵