Yacine Mahdid

@yacinelearning

(neuro/ai) I make technical deep learning tutorials 👺

Montreal Canada

Joined January 2019

504Following

5KFollowers

Pinned

Yacine Mahdid@yacinelearning · Jun 28

if there is one thing that you must not do is surrender don’t surrender your dreams, your passion, your curiosity or your freedom never

yacinelearning's tweet image. if there is one thing that you must not do is surrender

don’t surrender your dreams, your passion, your curiosity or your freedom

never

5.0K

Pinned

Yacine Mahdid@yacinelearning · Jul 22

I’m tweeting from here btw I just pace back and forth the trail while sipping espresso

491

Yacine Mahdid@yacinelearning · 23 h

everyone meet alex very smart engineer/researcher he co-authored a bunch of interesting LLM papers worth a look 🤝

AAlex@alexinbinary · 23 h

Share some with me bro

2.0K

Yacine Mahdid@yacinelearning · Jul 22

people slowly realizing how simple stuff are in deep learning doesn’t get old

182

5.0K

Yacine Mahdid@yacinelearning · Jul 22

some X employees started to follow me on here and I don’t know how tell them

3.0K

Yacine Mahdid@yacinelearning · Jul 22

🤌🤌🤌

602

Yacine Mahdid@yacinelearning · Jul 22

I knew at a young age that cyanide was a dangerous compound for a single reason

606

Yacine Mahdid@yacinelearning · Jul 22

most technical topics look complicated because it’s like 7+ simple things mushed together untangle then understand

156

5.0K

Yacine Mahdid@yacinelearning · Jul 22

4000 of you folks follows me to learn about deep learning it’s absolutely amazing you are all there for deep learning and nothing else just the deep learning part wow

yacinelearning's tweet image. 4000 of you folks follows me to learn about deep learning it’s absolutely amazing

you are all there for deep learning and nothing else

just the deep learning part wow

288

16.0K

Yacine Mahdid@yacinelearning · Jul 22

best one so far

tthebes@voooooogel · Jul 22

439

Yacine Mahdid Retweeted

Aryo Pradipta Gema@aryopg · Jul 22

New Anthropic Research: “Inverse Scaling in Test-Time Compute” We found cases where longer reasoning leads to lower accuracy. Our findings suggest that naïve scaling of test-time compute may inadvertently reinforce problematic reasoning patterns. 🧵

124

956

539

132.0K

Yacine Mahdid@yacinelearning · Jul 22

day started with discussion about both the adam optimizers and the og adam from eden incredible meta going right now

yacinelearning's tweet image. day started with discussion about both the adam optimizers and the og adam from eden

incredible meta going right now

436

Yacine Mahdid@yacinelearning · Jul 22

15 min tutorial on the adam optimizer by the end of it you will understand what is up with the formula 100% you'll see it's not that complicated™️

yacinelearning's tweet image. 15 min tutorial on the adam optimizer
by the end of it you will understand what is up with the formula 100%

you'll see it's not that complicated™️

702

696

35.0K

Yacine Mahdid@yacinelearning · Jul 22

> low value work (like data cleaning) uh...

LLuke Heeney@heeney_luke · Jul 18

Academia must be the only industry where extremely high-skilled PhD students spend much of their time doing low value work (like data cleaning). A 1st year management consultant outsources this immediately. Imagine the productivity gains if PhDs could focus on thinking

128

5.0K

514

266.0K

Yacine Mahdid@yacinelearning · Jul 21

The new Qwen3 update takes back the benchmark crown from Kimi 2. Some highlights of how Qwen3 235B-A22B differs from Kimi 2: - 4.25x smaller overall but has more layers (transformer blocks); 235B vs 1 trillion - 1.5x fewer active parameters (22B vs. 32B) - much fewer experts in…

AAiBattle@AiBattle_ · Jul 21

Kimi K2 🆚 Qwen-3-235B-A22B-2507 The new updated Qwen 3 model beats Kimi K2 on most benchmarks. The jump on the ARC-AGI score is especially impressive An updated reasoning model is also on the way according to Qwen researchers

142

1.0K

488

88.0K

Yacine Mahdid Retweeted

wh@nrehiew_ · Jul 21

How to train a State-of-the-art agent model. Let's talk about the Kimi K2 paper.

161

2.0K

181.0K

Yacine Mahdid Retweeted

elie@eliebakouch · Jul 21

Kimi K2 tech report is full of gems as always. Here are my notes on it: > MuonClip: Pretty crazy how after 70k the training stabilizes and the QK-clip is basically inactive. There is also no loss in perf with QK-clip which is not trivial at all (at small scale but with…

336

274

24.0K

Yacine Mahdid Retweeted

Tanishq Abraham is at ICML@iScienceLuvr · Jul 21

Kimi K2 paper dropped! describes: - MuonClip optimizer - large-scale agentic data synthesis pipeline that systematically generates tool-use demonstrations via simulated and real-world environments - an RL framework that combines RLVR with a self- critique rubric reward mechanism…

172

977

598

56.0K

Yacine Mahdid@yacinelearning · Jul 21

I’ve been on a 2 days coffee binge it’s been like my 10th espresso in 42h and I feel limitless and task are getting destroyed right now while I listen to lana del ray on blast and on loop wow

794