younes
@yb2698
Falcon-H1 now runs natively on your device via llama.cpp—0.5B to 34B models, no server needed. Fast inference, long context, multilingual, tool-ready. Build, test, and go beyond. #FalconH1 #LocalLLM #AIOnDevice #EdgeAI #OpenSourceAI
Interesting plot! And amazing work. I am wondering about Falcon-H1-3B position in this plot
In case you missed, you can now @Microsoft BitNet and @TIIuae Falcon-E/3 (BitNet) at +100 tok/s on your Mac! Get started now: > pip install -U mlx-lm Model cards 👇🏽
Latest mlx-lm is out! pip install -U mlx-lm Bunch of new models: - SmolLM3 (Hugging Face) - Ernie family (Baidu) - BitNet (Microsoft) - Falcon-E (TII) - Text-only Gemma3n (Google) - MiniCPM4 (OpenBMB) - AFM (Apple) +Performance improvements for DWQ, dynamic quantization, and…
Excited to have contributed into Falcon-E (Bitnet) integration with @Prince_Canuma @awnihannun in mlx-lm Falcon-E now fully supported in mlx-lm - as simple as `mlx_lm.generate --model tiiuae/Falcon-E-1B-Instruct --prompt "Implement bubble sort" --max-tokens 100 --temp 0.1` 🚀
Latest mlx-lm is out! pip install -U mlx-lm Bunch of new models: - SmolLM3 (Hugging Face) - Ernie family (Baidu) - BitNet (Microsoft) - Falcon-E (TII) - Text-only Gemma3n (Google) - MiniCPM4 (OpenBMB) - AFM (Apple) +Performance improvements for DWQ, dynamic quantization, and…
Thanks to some amazing contributions from: @angeloskath @reach_vb @Prince_Canuma @yb2698 @ivanfioravanti @ActuallyIsaak @JohnMai_Dev and others!
Woohoo, what an awesome release 🚀🔥
Latest mlx-lm is out! pip install -U mlx-lm Bunch of new models: - SmolLM3 (Hugging Face) - Ernie family (Baidu) - BitNet (Microsoft) - Falcon-E (TII) - Text-only Gemma3n (Google) - MiniCPM4 (OpenBMB) - AFM (Apple) +Performance improvements for DWQ, dynamic quantization, and…
🚀 Exciting news! Falcon-H1 & Falcon-E are now on Oumi — the open-source platform for training, fine-tuning (SFT, LoRA, QLoRA), and deploying LLMs anywhere: laptops, cloud, or clusters. Start building: github.com/oumi-ai/oumi/t… #FalconH1 #FalconE #OpenSourceAI #LLM
Latest mlx-lm is out! pip install -U mlx-lm Bunch of new models: - SmolLM3 (Hugging Face) - Ernie family (Baidu) - BitNet (Microsoft) - Falcon-E (TII) - Text-only Gemma3n (Google) - MiniCPM4 (OpenBMB) - AFM (Apple) +Performance improvements for DWQ, dynamic quantization, and…
🚀 Shape the future of LLM evaluation! Join the #E2LM NeurIPS 2025 competition to design benchmarks for scientific reasoning in early LLM training. Models (0.5B–3B) provided by @TIIuae. No big compute needed! Register: e2lmc.github.io #LLMEvaluation #NeurIPS2025
Last 2 weeks: > Gemma3n > Phi4mm vision working, now audio and a few optimisations missing > Falcon H1 (Mamba + Transformers) > Bitnet metal kernel 90% faster on MLX compared to official Bitnet.cpp > Falcon Bitnet > Processed 34m samples and training a new secret model >…
Thanks to @angeloskath we are now 90% faster than bitnet.cpp! We went from 66 tok/s to 100 tok/s on M3 Max🔥🚀 And around 150 tok/s on M2 Ultra
MLX is now almost 30% faster than bitnet.cpp 🚀 Thanks to the new Fused QKV metal kernel I built!
🚀 SGLang now supports Hugging Face Transformers as a backend! Run any transformers-compatible model with fast, production-grade inference — no native support needed. Just plug and play 🥳 Blogpost: huggingface.co/blog/transform…
Great to see Falcon-H1 on @llamafactory_ai !
LLaMA-Factory now supports fine-tuning the Falcon H1 family of models using Full-FineTune or LoRA, kudos @DhiaRhayem