younes (@yb2698)

younes Retweeted

T

Technology Innovation Institute@TIIuae · Jul 10

Falcon-H1 now runs natively on your device via llama.cpp—0.5B to 34B models, no server needed. Fast inference, long context, multilingual, tool-ready. Build, test, and go beyond. #FalconH1 #LocalLLM #AIOnDevice #EdgeAI #OpenSourceAI

23

18

7

2

922

younes Retweeted

I

Ilyas Chahed@ilyas_chahed · Jul 8

Interesting plot! And amazing work. I am wondering about Falcon-H1-3B position in this plot

0

1

5

0

493

y

younes@yb2698 · Jul 8

In case you missed, you can now @Microsoft BitNet and @TIIuae Falcon-E/3 (BitNet) at +100 tok/s on your Mac! Get started now: > pip install -U mlx-lm Model cards 👇🏽

AAwni Hannun@awnihannun · Jul 8

Latest mlx-lm is out! pip install -U mlx-lm Bunch of new models: - SmolLM3 (Hugging Face) - Ernie family (Baidu) - BitNet (Microsoft) - Falcon-E (TII) - Text-only Gemma3n (Google) - MiniCPM4 (OpenBMB) - AFM (Apple) +Performance improvements for DWQ, dynamic quantization, and…

2

5

40

18

2.0K

y

younes@yb2698 · Jul 8

Excited to have contributed into Falcon-E (Bitnet) integration with @Prince_Canuma @awnihannun in mlx-lm Falcon-E now fully supported in mlx-lm - as simple as `mlx_lm.generate --model tiiuae/Falcon-E-1B-Instruct --prompt "Implement bubble sort" --max-tokens 100 --temp 0.1` 🚀

AAwni Hannun@awnihannun · Jul 8

Latest mlx-lm is out! pip install -U mlx-lm Bunch of new models: - SmolLM3 (Hugging Face) - Ernie family (Baidu) - BitNet (Microsoft) - Falcon-E (TII) - Text-only Gemma3n (Google) - MiniCPM4 (OpenBMB) - AFM (Apple) +Performance improvements for DWQ, dynamic quantization, and…

1

5

21

3

2.0K

younes Retweeted

A

Awni Hannun@awnihannun · Jul 8

Thanks to some amazing contributions from: @angeloskath @reach_vb @Prince_Canuma @yb2698 @ivanfioravanti @ActuallyIsaak @JohnMai_Dev and others!

1

3

23

2

3.0K

y

younes@yb2698 · Jul 8

Woohoo, what an awesome release 🚀🔥

AAwni Hannun@awnihannun · Jul 8

Latest mlx-lm is out! pip install -U mlx-lm Bunch of new models: - SmolLM3 (Hugging Face) - Ernie family (Baidu) - BitNet (Microsoft) - Falcon-E (TII) - Text-only Gemma3n (Google) - MiniCPM4 (OpenBMB) - AFM (Apple) +Performance improvements for DWQ, dynamic quantization, and…

0

2

14

0

1.0K

younes Retweeted

T

Technology Innovation Institute@TIIuae · Jul 8

🚀 Exciting news! Falcon-H1 & Falcon-E are now on Oumi — the open-source platform for training, fine-tuning (SFT, LoRA, QLoRA), and deploying LLMs anywhere: laptops, cloud, or clusters. Start building: github.com/oumi-ai/oumi/t… #FalconH1 #FalconE #OpenSourceAI #LLM

2

9

15

2

738

younes Retweeted

A

Awni Hannun@awnihannun · Jul 8

Latest mlx-lm is out! pip install -U mlx-lm Bunch of new models: - SmolLM3 (Hugging Face) - Ernie family (Baidu) - BitNet (Microsoft) - Falcon-E (TII) - Text-only Gemma3n (Google) - MiniCPM4 (OpenBMB) - AFM (Apple) +Performance improvements for DWQ, dynamic quantization, and…

17

35

251

86

19.0K

younes Retweeted

T

Technology Innovation Institute@TIIuae · Jul 1

🚀 Shape the future of LLM evaluation! Join the #E2LM NeurIPS 2025 competition to design benchmarks for scientific reasoning in early LLM training. Models (0.5B–3B) provided by @TIIuae. No big compute needed! Register: e2lmc.github.io #LLMEvaluation #NeurIPS2025

0

3

7

2

3.0K

younes Retweeted

P

Prince Canuma@Prince_Canuma · Jun 26

Last 2 weeks: > Gemma3n > Phi4mm vision working, now audio and a few optimisations missing > Falcon H1 (Mamba + Transformers) > Bitnet metal kernel 90% faster on MLX compared to official Bitnet.cpp > Falcon Bitnet > Processed 34m samples and training a new secret model >…

0

3

32

4

3.0K

y

younes@yb2698 · Jun 26

Thanks to @angeloskath we are now 90% faster than bitnet.cpp! We went from 66 tok/s to 100 tok/s on M3 Max🔥🚀 And around 150 tok/s on M2 Ultra

PPrince Canuma@Prince_Canuma · Jun 15

MLX is now almost 30% faster than bitnet.cpp 🚀 Thanks to the new Fused QKV metal kernel I built!

7

6

64

14

9.0K

younes Retweeted

M

Marc Sun@_marcsun · Jun 23

🚀 SGLang now supports Hugging Face Transformers as a backend! Run any transformers-compatible model with fast, production-grade inference — no native support needed. Just plug and play 🥳 Blogpost: huggingface.co/blog/transform…

3

24

109

29

19.0K

y

younes@yb2698 · Jun 18

Great to see Falcon-H1 on @llamafactory_ai !

LLLaMA Factory@llamafactory_ai · Jun 18

LLaMA-Factory now supports fine-tuning the Falcon H1 family of models using Full-FineTune or LoRA, kudos @DhiaRhayem

0

3

0

341