A
Alexandre TL
@AlexandreTL2
Intern at @LinguaCustodia in Paris. (Pre|post)-training LLMs
Montpellier, France
Joined January 2020
295Following
733Followers
Pinned
A
Alexandre TL@AlexandreTL2 · Jul 31
muP works great for Mamba ! Zero-shot transfered the learning rate from a 172k model to a 105 model. Now part of mamba.py 👇🧵

2
8
69
33
7.0K
Alexandre TL Retweeted
S
Seunghyun Seo@SeunghyunSEO7 · May 13
btw, i wrote a post about "how to scale" based on what i've learned over the past few months. it covers muP, HP scaling laws, and some stuffs. would be happy to get any feedback or discussion. (it's pretty verbose and no TL;DR, sorry lol) howtoscalenn.github.io
11
79
681
818
54.0K