jianlin.su

@Jianlin_S

Grad is all you need @Kimi_Moonshot Blog: https://jianlin.su , Cool Papers: https://papers.cool

Joined February 2025

13Following

2KFollowers

Pinned

jianlin.su@Jianlin_S · Jun 6

kexue.fm/archives/11006 introduces the idea of using matrices and their msign to perform general operations on the singular values, including singular value clipping, step functions, and arbitrary polynomials (not just odd polynomials). @leloykun @YouJiacheng @_arohan_

4.0K

jianlin.su@Jianlin_S · Jul 21

kexue.fm/archives/11175 Extend the last article to calculate any G P^{-s/r}

1.0K

jianlin.su@Jianlin_S · Jul 19

kexue.fm/archives/11158 a pretty method for solving P^{1/2}, P^{-1/2} and GP^{-1/2}, reusing the coefs of msign.

3.0K

jianlin.su Retweeted

Kimi.ai@Kimi_Moonshot · Jul 11

🚀 Hello, Kimi K2! Open-Source Agentic Model! 🔹 1T total / 32B active MoE model 🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models 🔹Strong in coding and agentic tasks 🐤 Multimodal & thought-mode not supported for now With Kimi K2, advanced agentic intelligence…

280

1.0K

7.0K

3.0K

2.5M

jianlin.su@Jianlin_S · Jul 10

kexue.fm/archives/11111 how MLA works (2)

6.0K

jianlin.su@Jianlin_S · Jul 1

Discuss efficient inversion of matrices of the form (Λ + QK.T * M), which commonly arise in modern linear attention mechanisms. kexue.fm/archives/11072

2.0K

jianlin.su@Jianlin_S · Jun 23

The latest method to calculating mclip (3 non-nested msign, very low error): kexue.fm/archives/11059 @leloykun @YouJiacheng @_arohan_

Jianlin_S's tweet image. The latest method to calculating mclip (3 non-nested msign, very low error): kexue.fm/archives/11059

@leloykun @YouJiacheng @_arohan_

4.0K

jianlin.su@Jianlin_S · Jun 20

The Linear Attention Odyssey: From Imitation to Innovation and Back kexue.fm/archives/11033

100

12.0K

jianlin.su@Jianlin_S · Jun 13

kexue.fm/archives/11025 Discussed the derivative calculation of the msign operator. If you are interested in the combination of “TTT + Muon” like arxiv.org/abs/2505.23884 , this might be helpful to you.

130

9.0K

jianlin.su@Jianlin_S · Jun 5

kexue.fm/archives/10996 The latest progress in finding better Newton-Schulz iterations for the msign operator. It directly derives the theoretical optimal solution through the equioscillation theorem and greedy transformation. Original paper: papers.cool/arxiv/2505.169…

3.0K

jianlin.su@Jianlin_S · Jun 2

kexue.fm/archives/10972 This article introduces the Equioscillation Theorem for polynomial best approximation, as well as the problem of differentiation of the infinity norm related to it.

4.0K

jianlin.su@Jianlin_S · May 26

kexue.fm/archives/10958 This article centers on the recently released MeanFlow and discusses the acceleration of diffusion model generation from the perspective of “average velocity.”

5.0K

jianlin.su@Jianlin_S · May 16

kexue.fm/archives/10945 Shared Expert and Fine-Grained Expert in MoE.

1.0K

jianlin.su@Jianlin_S · May 4

Exploring the Magic Behind MLA: Why Is It So Effective? kexue.fm/archives/10907 - Larger head_dims = better performance? - Partial RoPE = secret sauce? - KV-Shared = added boost? MLA's success may lie in its unique combination of these elements. 🚀 #MLA

141

23.0K

jianlin.su@Jianlin_S · Apr 26

the gradient of SVD: kexue.fm/archives/10878

17.0K

jianlin.su@Jianlin_S · Apr 18

A Novel RoPE: kexue.fm/archives/10862

169

34.0K