J
Ji-Ha
@Ji_Ha_Kim
Joined January 2024
62Following
3KFollowers
J
J
J
Ji-Ha@Ji_Ha_Kim · May 30
In Muon, during dualization, we normalize the gradient matrix entrywise by the Frobenius norm to bound the spectral norm and ensure convergence. Have we compared the performance versus the geometric mean of the matrix norms induced by 1 and infinity vector norms (or the minimum)?

3
6
108
59
5.0K
J