João Gante

@joao_gante

ML @huggingface 🤗, making text generation users happy. 🇵🇹

Lisbon, Portugal

Joined September 2019

625Following

3KFollowers

Pinned

João Gante@joao_gante · Jul 3

LET'S GO! Cursor using local 🤗 transformers models! You can now test ANY transformers-compatible LLM against your codebase. From hacking to production, it takes only a few minutes: anything `transformers` does, you can serve into your app 🔥 Here's a demo with Qwen3 4B:

198

147

32.0K

João Gante Retweeted

vLLM@vllm_project · Jul 22

The @huggingface Transformers ↔️ @vllm_project integration just leveled up: Vision-Language Models are now supported out of the box! If the model is integrated into Transformers, you can now run it directly with vLLM. github.com/vllm-project/v… Great work @RTurganbay 👏

265

21.0K

João Gante@joao_gante · Jul 16

We already have a solution for kernel install issues, in transformers you can hotswap with this: huggingface.co/kernels-commun…, its a single install, very light (200mb?) because matches only your hardware, and... will support metal 😄

WWing Lian (caseus)@winglian · Jul 15

The current state of the ecosystem for post-training using GRPO w/ vllm + flash attention is frustratingly brittle. - The most recent vllm only supports PyTorch==2.7.0 - vllm requires xformers, but specifically only v0.0.30 is supported for torch 2.7.0. Any prior version of…

46.0K

João Gante Retweeted

Lysandre@LysandreJik · Jul 16

BOOOM! Both VLMs and LLMs now have a baked-in http server w/ OpenAI spec compatible API in transformers Launch it with `transformers serve` and connect your favorite apps. Here I'm running @OpenWebUI with local transformers. LLM, VLM, tool calling is in, STT & TTS coming soon!

16.0K

João Gante Retweeted

Leandro von Werra@lvwerra · Jul 14

Personally, I really prefer casual online posts and discussions like this from engineers and researchers over super condensed papers. It's much more pleasant to read and understand the reasoning behind the decisions. Great job @Kimi_Moonshot team! zhihu.com/question/19271…

184

13.0K

João Gante Retweeted

clem 🤗@ClementDelangue · Jul 11

1T parameters, open-weights, just released on @huggingface!

186

2.0K

652

300.0K

João Gante@joao_gante · Jul 11

Looks like its time cursor will have to officially put an open model into its default options mix if this beats sota across even all frontier models at agentic coding

KKimi.ai@Kimi_Moonshot · Jul 11

🚀 Hello, Kimi K2! Open-Source Agentic Model! 🔹 1T total / 32B active MoE model 🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models 🔹Strong in coding and agentic tasks 🐤 Multimodal & thought-mode not supported for now With Kimi K2, advanced agentic intelligence…

536

66.0K

João Gante Retweeted

METR@METR_Evals · Jul 10

We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.

235

1.0K

7.0K

3.0K

3.5M

João Gante@joao_gante · Jul 9

Cloud models just can't offer some features by design: privacy, low-latency, offline connectivity, free generation, etc. Not every application requires 2T parameters. Probably most don't.

DDan Biderman@dan_biderman · Jul 8

Do you believe in local LLMs?

5.0K

João Gante Retweeted

Dan Biderman@dan_biderman · Jul 8

Do you believe in local LLMs?

539

213.0K

João Gante@joao_gante · Jul 8

We just released the best 3B model, 100% open-source, open dataset, architecture details, exact data mixtures and full training recipe including pre-training, mid-training, post-training, and synthetic data generation for everyone to train their own. Let's go open-source AI!

LLoubna Ben Allal@LoubnaBenAllal1 · Jul 8

Introducing SmolLM3: a strong, smol reasoner! > SoTA 3B model > dual mode reasoning (think/no_think) > long context, up to 128k > multilingual: en, fr, es, de, it, pt > fully open source (data, code, recipes) huggingface.co/blog/smollm3

244

2.0K

795

188.0K

João Gante Retweeted

Matej Sirovatka@m_sirovatka · Jul 8

The biggest dataset of human written GPU Code all open-source? 👀 YES Please! We at @GPU_MODE have released around 40k 🚀 human written code samples spanning Triton, Hip and PyTorch and it's all open on the @huggingface Hub. Train the new GPT to make GPTs faster ⚡️ Link below ⬇️

318

150

31.0K

João Gante Retweeted

emozilla@theemozilla · Jul 7

MoE money, MoE problems: it's straight up bonkers that there is not a single finetune of llama 4. zero. zilch. nada. everything on the hub is a reupload. trust me, I've spent the past several weeks trying with torchtune, torchtitan, hf -- anything. it literally just doesn't…

560

129

68.0K