Alaa El-Nouby (@alaa_nouby)

Pinned

A

Alaa El-Nouby@alaa_nouby · Nov 22

𝗗𝗼𝗲𝘀 𝗮𝘂𝘁𝗼𝗿𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝘃𝗲 𝗽𝗿𝗲-𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝘄𝗼𝗿𝗸 𝗳𝗼𝗿 𝘃𝗶𝘀𝗶𝗼𝗻? 🤔 Delighted to share AIMv2, a family of strong, scalable, and open vision encoders that excel at multimodal understanding, recognition, and grounding. github.com/apple/ml-aim (🧵)

4

27

153

48

26.0K

Alaa El-Nouby Retweeted

M

Michael Caine@themichaelcaine · Jul 24

Feed the Children of Gaza, no child should be starving.

4.0K

34.0K

233.0K

4.0K

10.4M

Alaa El-Nouby Retweeted

E

Emmanuel Macron@EmmanuelMacron · Jul 24

Consistent with its historic commitment to a just and lasting peace in the Middle East, I have decided that France will recognize the State of Palestine. I will make this solemn announcement before the United Nations General Assembly this coming September.…

13.0K

19.0K

107.0K

4.0K

7.2M

Alaa El-Nouby Retweeted

t

tokenbender@tokenbender · Jul 17

we missed a banger paper in the grok4/k2 drop noise guys. these guys > look for optimal ways to select data mixes to get max improvement on a model given a target domain. > do multimodal validation > show good extrapolation accuracy (testing on 1.4B and predicting on 8B)

11

76

795

794

71.0K

Alaa El-Nouby Retweeted

A

AK@_akhaliq · Jul 16

Scaling Laws for Optimal Data Mixtures

1

5

46

22

9.0K

A

Alaa El-Nouby@alaa_nouby · Jul 16

If you are at attending ICML today, consider checking out Samara’s poster on the role of sparsity in MoEs at 11 AM PDT. Poster ID: E-2810

SSamira Abnar@samira_abnar · Jan 28

🚨 One question that has always intrigued me is the role of different ways to increase a model's capacity: parameters, parallelizable compute, or sequential compute? We explored this through the lens of MoEs:

0

2

8

2

767

A

Alaa El-Nouby@alaa_nouby · Jul 15

Deciding which data mixture to use has always been such a crucial part for nailing a good pre-training recipe. Check out this paper, led by @PierreAblin , @MustafaShukor1 and the team at Apple MLR, providing a principled way for selecting optimal data mixture weights!

MMustafa Shukor@MustafaShukor1 · Jul 15

We propose new scaling laws that predict the optimal data mixture, for pretraining LLMs, native multimodal models and large vision encoders ! Only running small-scale experiments is needed, and we can then extrapolate to large-scale ones. These laws allow 1/n 🧵

0

4

57

13

4.0K

Alaa El-Nouby Retweeted

M

Mustafa Shukor@MustafaShukor1 · Jul 15

We propose new scaling laws that predict the optimal data mixture, for pretraining LLMs, native multimodal models and large vision encoders ! Only running small-scale experiments is needed, and we can then extrapolate to large-scale ones. These laws allow 1/n 🧵

5

47

265

210

29.0K

Alaa El-Nouby Retweeted

M

Min-Hung (Steve) Chen@CMHungSteven · Jun 10

@CVPR is around the corner!! Join us at the Workshop on T4V at #CVPR2025 with a great speaker lineup (@MikeShou1, @jw2yang4ai, @WenhuChen, @roeiherzig, Yuheng Li, Kristen Grauman) covering diverse topics! Website: sites.google.com/view/t4v-cvpr2… #CVPR #Transformer #Vision #T4V2025 #T4V

1

19

45

1

8.0K

Alaa El-Nouby Retweeted

M

Mustafa Shukor@MustafaShukor1 · Jun 3

The Worldwide @LeRobotHF hackathon is in 2 weeks, and we have been cooking something for you… Introducing SmolVLA, a Vision-Language-Action model with light-weight architecture, pretrained on community datasets, with an asynchronous inference stack, to control robots🧵

6

81

441

295

82.0K

Alaa El-Nouby Retweeted

�

𝚐𝔪𝟾𝚡𝚡𝟾@gm8xx8 · Jun 3

SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics PAPER: arxiv.org/abs/2506.01844

3

22

134

55

14.0K

Alaa El-Nouby Retweeted

J

Jakob Foerster@j_foerst · May 27

Hello World: My team at FAIR / @metaai (AI Research Agent) is looking to hire contractors across software engineering and ML. If you are interested and based in the UK, please fill in the following short EoI form: docs.google.com/forms/d/e/1FAI…

3

23

113

50

15.0K

Alaa El-Nouby Retweeted

D

Demis Hassabis@demishassabis · May 26

Me and the Egyptian King 👑 best player in the world - 47 G/As, totally unreal season. Let me know if you ever fancy a game of online chess! 😀.@MoSalah

106

157

3.0K

157

309.0K

A

Alaa El-Nouby@alaa_nouby · May 21

🤯

HHernan Moraldo@hhm · May 20

How was Veo 3 made? (turn the sound on) #veo3

0

5

1

439

Alaa El-Nouby Retweeted

P

Paul Graham@paulg · May 18

I don't have to tell you what happened to these three boys. You already know. How awful is that?

1.0K

9.0K

49.0K

2.0K

2.6M

A

Alaa El-Nouby@alaa_nouby · May 3

Proud to report that TarFlow is accepted to #ICML2025 as a Spotlight 🎉 I’m really looking forward to new ideas and applications enabled by powerful Normalizing Flow models 🚀

SShuangfei Zhai@zhaisf · Dec 10

We attempted to make Normalizing Flows work really well, and we are happy to report our findings in paper arxiv.org/pdf/2412.06329, and code github.com/apple/ml-tarfl…. [1/n]

9

13

84

32

19.0K

A

Alaa El-Nouby@alaa_nouby · Apr 14

I’ve been curious about how early vs late-fusion multimodal approaches compare in controlled conditions. Great to see this studied in depth. Turns out, optimal late fusion has higher params-to-data, and performance between early and late fusion is similar. Brilliant work from…

MMustafa Shukor@MustafaShukor1 · Apr 11

We release a large scale study to answer the following: - Is late fusion inherently better than early fusion for multimodal models? - How do native multimodal models scale compared to LLMs. - How sparsity (MoEs) can play a detrimental role in handling heterogeneous modalities? 🧵

1

8

41

17

4.0K

Alaa El-Nouby Retweeted

A

Alaa El-Nouby@alaa_nouby · Apr 11

We have been thinking a lot about how to train truly native multimodal models: (1) what arch to use (early-fusion, late-fusion, MoEs)? (2) the impact of data mixtures (interleaved, img-cap, text data) We took a stab at answering these questions (and more) in this preprint ...

4

27

170

89

10.0K

A

Alaa El-Nouby@alaa_nouby · Apr 12

Excited to see further studies into early fusion vs late fusion models, in particular a great analysis into multimodal MoE’s aligned with our findings in MoMa on designing parameter specialization in multimodal LLMs. A few key things that helped us on top of the results presented…

MMustafa Shukor@MustafaShukor1 · Apr 11

We release a large scale study to answer the following: - Is late fusion inherently better than early fusion for multimodal models? - How do native multimodal models scale compared to LLMs. - How sparsity (MoEs) can play a detrimental role in handling heterogeneous modalities? 🧵

1

8

37

16

7.0K

Alaa El-Nouby Retweeted

A

Anara@anaralabs · Apr 11

Apple just broke the scaling laws for image models. Imagine creating Ghibli art, but 10x faster.

21

57

869

529

196.0K

A

Alaa El-Nouby@alaa_nouby · Apr 11

Mustafa keeps releasing multimodal bangers

MMustafa Shukor@MustafaShukor1 · Apr 11

We release a large scale study to answer the following: - Is late fusion inherently better than early fusion for multimodal models? - How do native multimodal models scale compared to LLMs. - How sparsity (MoEs) can play a detrimental role in handling heterogeneous modalities? 🧵

0

6

44

13

4.0K