Jeremy Bernstein

@jxbz

🧪 @thinkymachines ✍️ anon feedback @ http://admonymous.co/jxbz

🌉

Joined January 2010

599Following

5KFollowers

Pinned

Jeremy Bernstein@jxbz · Mar 7

I just wrote my first blog post in four years! It is called "Deriving Muon". It covers the theory that led to Muon and how, for me, Muon is a meaningful example of theory leading practice in deep learning (1/11)

jxbz's tweet image. I just wrote my first blog post in four years! It is called "Deriving Muon". It covers the theory that led to Muon and how, for me, Muon is a meaningful example of theory leading practice in deep learning

(1/11)

131

928

916

108.0K

Jeremy Bernstein@jxbz · Jul 21

👀

JJohn Langford@JohnCLangford · Jul 20

Apparently Dion is now being worked on for Torch Titan: github.com/pytorch/torcht… :-)

2.0K

Jeremy Bernstein Retweeted

Soumith Chintala@soumithchintala · Jul 16

considering Muon is so popular and validated at scale, we've just decided to welcome a PR for it in PyTorch core by default. If anyone wants to take a crack at it... github.com/pytorch/pytorc…

845

252

77.0K

Jeremy Bernstein@jxbz · Jul 16

Still a relative newbie, but I am very excited about this team and what we are building

MMira Murati@miramurati · Jul 15

Thinking Machines Lab exists to empower humanity through advancing collaborative general intelligence. We're building multimodal AI that works with how you naturally interact with the world - through conversation, through sight, through the messy way we collaborate. We're…

172

17.0K

Jeremy Bernstein Retweeted

Kimi.ai@Kimi_Moonshot · Jul 11

🚀 Hello, Kimi K2! Open-Source Agentic Model! 🔹 1T total / 32B active MoE model 🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models 🔹Strong in coding and agentic tasks 🐤 Multimodal & thought-mode not supported for now With Kimi K2, advanced agentic intelligence…

280

1.0K

7.0K

3.0K

2.5M

Jeremy Bernstein Retweeted

Yuchen Jin@Yuchenj_UW · Jul 11

Holy shit. Kimi K2 was pre-trained on 15.5T tokens using MuonClip with zero training spike. Muon has officially scaled to the 1-trillion-parameter LLM level. Many doubted it could scale, but here we are. So proud of the Moum team: @kellerjordan0, @bozavlado, @YouJiacheng,…

139

2.0K

628

131.0K

Jeremy Bernstein Retweeted

Phi Hoang@apostraphi · Jun 18

midjourney introduces video generation and it’s surpassing all my expectations.

108

2.0K

751

235.0K

Jeremy Bernstein Retweeted

TianyLin@tianylin · Apr 27

Announcing 𝐟𝐥𝐚𝐬𝐡-𝐦𝐮𝐨𝐧: a 🐍 pkg with customized CUDA kernel that aims to boost Muon optimizer: github.com/nil0x9/flash-m… 1/n

249

136

20.0K

Jeremy Bernstein@jxbz · Apr 15

Pretty wild to see work that I contributed to (e.g., AlgoPerf, Crowded Valley @robinschmidt_) included in a university course. I feel very honored.

DDamek@damekdavis · Apr 13

Lecture 11: benchmarking optimizers 1. the problem: comparing optimizers (sgd, adam, etc.) in deep learning is tricky. 2. challenge 1: defining "speed". curves cross, so use time-to-result. 3. challenge 2: hyperparameter tuning trap. protocol matters more than algo? (choi et…

3.0K