Alexandre TL

@AlexandreTL2

Intern at @LinguaCustodia in Paris. (Pre|post)-training LLMs

Montpellier, France

Joined January 2020

295Following

733Followers

Pinned

Alexandre TL@AlexandreTL2 · Jul 31

muP works great for Mamba ! Zero-shot transfered the learning rate from a 172k model to a 105 model. Now part of mamba.py 👇🧵

7.0K

Alexandre TL Retweeted

Seunghyun Seo@SeunghyunSEO7 · May 13

btw, i wrote a post about "how to scale" based on what i've learned over the past few months. it covers muP, HP scaling laws, and some stuffs. would be happy to get any feedback or discussion. (it's pretty verbose and no TL;DR, sorry lol) howtoscalenn.github.io

681

818

54.0K