João Gante
@joao_gante
ML @huggingface 🤗, making text generation users happy. 🇵🇹
LET'S GO! Cursor using local 🤗 transformers models! You can now test ANY transformers-compatible LLM against your codebase. From hacking to production, it takes only a few minutes: anything `transformers` does, you can serve into your app 🔥 Here's a demo with Qwen3 4B:
The @huggingface Transformers ↔️ @vllm_project integration just leveled up: Vision-Language Models are now supported out of the box! If the model is integrated into Transformers, you can now run it directly with vLLM. github.com/vllm-project/v… Great work @RTurganbay 👏
We already have a solution for kernel install issues, in transformers you can hotswap with this: huggingface.co/kernels-commun…, its a single install, very light (200mb?) because matches only your hardware, and... will support metal 😄
The current state of the ecosystem for post-training using GRPO w/ vllm + flash attention is frustratingly brittle. - The most recent vllm only supports PyTorch==2.7.0 - vllm requires xformers, but specifically only v0.0.30 is supported for torch 2.7.0. Any prior version of…
BOOOM! Both VLMs and LLMs now have a baked-in http server w/ OpenAI spec compatible API in transformers Launch it with `transformers serve` and connect your favorite apps. Here I'm running @OpenWebUI with local transformers. LLM, VLM, tool calling is in, STT & TTS coming soon!
Personally, I really prefer casual online posts and discussions like this from engineers and researchers over super condensed papers. It's much more pleasant to read and understand the reasoning behind the decisions. Great job @Kimi_Moonshot team! zhihu.com/question/19271…
1T parameters, open-weights, just released on @huggingface!
Looks like its time cursor will have to officially put an open model into its default options mix if this beats sota across even all frontier models at agentic coding
🚀 Hello, Kimi K2! Open-Source Agentic Model! 🔹 1T total / 32B active MoE model 🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models 🔹Strong in coding and agentic tasks 🐤 Multimodal & thought-mode not supported for now With Kimi K2, advanced agentic intelligence…
We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
Cloud models just can't offer some features by design: privacy, low-latency, offline connectivity, free generation, etc. Not every application requires 2T parameters. Probably most don't.
Do you believe in local LLMs?
We just released the best 3B model, 100% open-source, open dataset, architecture details, exact data mixtures and full training recipe including pre-training, mid-training, post-training, and synthetic data generation for everyone to train their own. Let's go open-source AI!
Introducing SmolLM3: a strong, smol reasoner! > SoTA 3B model > dual mode reasoning (think/no_think) > long context, up to 128k > multilingual: en, fr, es, de, it, pt > fully open source (data, code, recipes) huggingface.co/blog/smollm3
The biggest dataset of human written GPU Code all open-source? 👀 YES Please! We at @GPU_MODE have released around 40k 🚀 human written code samples spanning Triton, Hip and PyTorch and it's all open on the @huggingface Hub. Train the new GPT to make GPTs faster ⚡️ Link below ⬇️
MoE money, MoE problems: it's straight up bonkers that there is not a single finetune of llama 4. zero. zilch. nada. everything on the hub is a reupload. trust me, I've spent the past several weeks trying with torchtune, torchtitan, hf -- anything. it literally just doesn't…