elie

@eliebakouch

Training llm's at @huggingface | http://hf.co/science

Joined January 2024

3KFollowing

4KFollowers

Pinned

elie@eliebakouch · Jul 8

Super excited to share SmolLM3, a new strong 3B model. SmolLM3 is fully open, we share the recipe, the dataset, the training codebase and much more! > Train on 11T token on 384 H100 for 220k GPU hours > Support long context up to 128k thanks to NoPE and intra document masking >…

eliebakouch's tweet image. Super excited to share SmolLM3, a new strong 3B model.

SmolLM3 is fully open, we share the recipe, the dataset, the training codebase and much more!

&gt; Train on 11T token on 384 H100 for 220k GPU hours
&gt; Support long context up to 128k thanks to NoPE and intra document masking
&gt;…

137

835

474

114.0K

Pinned

elie@eliebakouch · Jul 16

There is a small group of anons that keep doing fantastic work like @tensorqt who everyone should follow. There are others who does push the emotional buttons by warping perspective to be negative sum or zero sum. If you are an anon and post really good useful/interesting…

JJoshua Achiam@jachiam0 · Jul 15

I think anons are essential to the ecosystem but also something has gone awry lately and there is a need for a reset.

12.0K

elie@eliebakouch · Jul 20

This is the Deep Blue moment for soccer

BBooster Robotics@boosterobotics · Jul 20

Watch my best shot of the day!

1.0K

elie@eliebakouch · Jul 19

I’ll be presenting TorchTitan: a PyTorch native platform for training foundation models tomorrow at the ICML @ESFoMo workshop! Come and say Hi!

EES-FoMo@ICML2025@ESFoMo · Jul 18

Looking forward to seeing everyone for ES-FoMo part three tomorrow! We'll be in East Exhibition Hall A (the big one), and we've got an exciting schedule of invited talks, orals, and posters planned for you tomorrow. Let's meet some of our great speakers! 1/

10.0K

elie Retweeted

Cline@cline · Jul 18

🤗🤗🤗 🤗❤️🤗 @huggingface & Cline = your LLM playground 🤗🤗🤗 You can access Kimi K2 & 6,140 (!) other open source models in Cline.

502

270

156.0K

elie Retweeted

clem 🤗@ClementDelangue · Jul 18

Really cool to see SmoLLM3, the current state-of-the-art 3B model land on @Azure AI. @Microsoft is a strong force in small efficient models as shown with the Phi family and others and we've been enjoying partnering with them closely on this and other topics. Thanks @satyanadella…

114

6.0K

elie Retweeted

ZF@zffc · Jul 17

In this report, we describe the 2025 Apple Foundation Models ("AFM"). We also introduce the new Foundation Models framework, which gives app developers direct access to the on-device AFM model machinelearning.apple.com/research/apple…

5.0K

elie Retweeted

Google AI Developers@googleaidevs · Jul 16

.@StanfordCRFM's Marin project has released the first fully open model in JAX. It’s an 'open lab' sharing the entire research process - including code, data, and logs, to enable reproducibility and further innovation. developers.googleblog.com/en/stanfords-m…

312

187

54.0K

elie@eliebakouch · Jul 16

But actually this is the og way of doing it and should stop by E-2103 to see @jxbz and Laker Newhouse whiteboard the whole paper.

JJeremy Bernstein@jxbz · Jul 16

Laker and I are presenting this work in an hour at ICML poster E-2103. It’s on a theoretical framework and language (modula) for optimizers that are fast (like Shampoo) and scalable (like muP). You can think of modula as Muon extended to general layer types and network topologies

8.0K

elie Retweeted

Jeremy Bernstein@jxbz · Jul 16

193

27.0K

elie@eliebakouch · Jul 15

In case the post was too vague, yes - this is the Hermes 3 dataset - 1 Million Samples - Created SOTA without the censorship at it's time on Llama-3 series (8, 70, and 405B) - Has a ton of data for teach system prompt adherence, roleplay, and a great mix of subjective and…

NNous Research@NousResearch · Jul 15

huggingface.co/datasets/NousR…

778

231

73.0K

elie Retweeted

Soumith Chintala@soumithchintala · Jul 16

considering Muon is so popular and validated at scale, we've just decided to welcome a PR for it in PyTorch core by default. If anyone wants to take a crack at it... github.com/pytorch/pytorc…

844

252

77.0K