Lewis Tunstall

@_lewtun

🤗 LLM whisperer @huggingface 📖 Co-author of "NLP with Transformers" book 💥 Ex-particle physicist 🤘 Occasional guitarist 🇦🇺 in 🇨🇭

Berne, Switzerland

Joined August 2018

524Following

16KFollowers

Pinned

Lewis Tunstall@_lewtun · Jan 25

We are reproducing the full DeepSeek R1 data and training pipeline so everybody can use their recipe. Instead of doing it in secret we can do it together in the open! 🧪 Step 1: replicate the R1-Distill models by distilling a high-quality reasoning corpus from DeepSeek-R1. 🧠…

301

2.0K

1.0K

272.0K

Lewis Tunstall@_lewtun · 12 h

There's now support for viewing JSON in string / dict columns in @huggingface datasets!!! 🔍 Great for all the tool calling datasets like the brand new hermes tool use dataset by @intrstllrninja

CCaleb@calebfahlgren · Jul 11

NEW 🔥!! There's can now view JSON for List cells on @huggingface datasets. Now there's no excuse for looking at your data! 🫣

1.0K

Lewis Tunstall Retweeted

Lin Yang@lyang36 · Jul 22

🚨 Olympiad math + AI: We ran Google’s Gemini 2.5 Pro on the fresh IMO 2025 problems. With careful prompting and pipeline design, it solved 5 out of 6 — remarkable for tasks demanding deep insight and creativity. The model could win gold! 🥇 #AI #Math #LLMs #IMO2025

111

1.0K

439

266.0K

Lewis Tunstall Retweeted

Andrew White 🐦‍⬛@andrewwhite01 · 9 h

HLE has recently become the benchmark to beat for frontier agents. We @FutureHouseSF took a closer look at the chem and bio questions and found about 30% of them are likely invalid based on our analysis and third-party PhD evaluations. 1/7

123

15.0K

Lewis Tunstall@_lewtun · 5 h

Paranoia (aka looking at your data) is the main difference between a model having garbage vibes or not :)

TTaco Cohen@TacoCohen · 17 h

What I look for when hiring? EXTREME PARANOIA about code and data

1.0K

Lewis Tunstall@_lewtun · Jul 22

After three intense months of hard work with the team, we made it! We hope this release can help drive the progress of Coding Agents. Looking forward to seeing Qwen3-Coder continue creating new possibilities across the digital world!

QQwen@Alibaba_Qwen · Jul 22

>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…

919

104

57.0K

Lewis Tunstall@_lewtun · Jul 22

You should join Cody for the memes alone 🦝

CCody Blakeney ✈️ ICML 2025@code_star · Jul 22

We are looking for a post-training lead at @datologyai we have gpus, you can make them go brrrr

3.0K

Lewis Tunstall Retweeted

Qwen@Alibaba_Qwen · Jul 22

253

1.0K

8.0K

4.0K

1.5M

Lewis Tunstall Retweeted

William Berrios@w33lliam · Jun 23

Excited to share 🤯 that our LMUnit models with @ContextualAI just claimed the top spots on RewardBench2 🥇 How did we manage to rank +5% higher than models like Gemini, Claude 4, and GPT4.1? More in the details below: 🧵 1/11

151

109

74.0K

Lewis Tunstall Retweeted

interstellarninja@intrstllrninja · Jul 22

today i'm releasing 50k rows of tool-use reasoning dataset compilation on huggingface includes following BFCL scenarios: - single turn tool-use - multiturn tool-use - multistep tool-use - relevance reasoning huggingface.co/datasets/inter…

308

191

52.0K

Lewis Tunstall Retweeted

Dimitris Papailiopoulos@DimitrisPapail · Jul 21

OpenAI and GDM should release IMO reasoning traces. For Science.

362

22.0K

Lewis Tunstall Retweeted

elie@eliebakouch · Jul 21

We've just release 100+ intermediate checkpoints and our training logs from SmolLM3-3B training. We hope this can be useful to the researcher working on mech interpret, training dynamics, RL and other topics :) Training logs: -> Usual training loss (the gap in the loss are due…

393

192

30.0K

Lewis Tunstall Retweeted

Omar Khattab@lateinteraction · Jul 20

It's clear that the next big thing after the shift from RLHF to "RLVR"* is scaling reward models ("verifiers") for concrete capabilities, not just average human preferences. This actually kinda looks very similar to RLHF. The main difference is that the verifiers here: [A] Are…

226

206

23.0K

Lewis Tunstall Retweeted

Jenia Jitsev 🏳️‍🌈 🇺🇦 🇮🇱@JJitsev · Jul 20

The sad robot in matharena.ai/imo/ is Grok 4. This shows again how careful one has to be with overblown claims from closed releases saying the usual "it's so over". Test contamination that cannot be checked makes benchs look great, but on novel problems, the crash comes.

19.0K

Lewis Tunstall Retweeted

Ernest Ryu@ErnestRyu · Jul 19

Two cents on AI getting International Math Olympiad (IMO) Gold, from a mathematician. Background: Last year, Google DeepMind (GDM) got Silver in IMO 2024. This year, OpenAI solved problems P1-P5 for IMO 2025 (but not P6), and this performance corresponds to Gold. (1/10)

315

4.0K

1.0K

706.0K

Lewis Tunstall Retweeted

Aksel@akseljoonas · Jul 20

Here’s how you train an email agent from scratch with GRPO 👇 1️⃣ Nail a prompted baseline first. It flushes out tool bugs & gives you a benchmark to beat. 2️⃣ When the plateau hits, switch to RL. A 14B model jumped 40%→96% —beating o3 & Gemini—by laser-focusing on one job.

2.0K

Lewis Tunstall Retweeted

Ross Taylor@rosstaylor90 · Jul 20

What seems like an exponential in AI is just a series of S curves. Each era rides on a wave of increasing compute but finds a new way to utilise it - overcoming limitations of the previous stage. Eg pre-training was the dominant way to utilise compute, but the limitations of…

8.0K

Lewis Tunstall@_lewtun · Jul 19

I think it's quite disputed to say it came as a surprise. Was 20% this week but it's been a lot higher. Seems hard to reason about. I don't like it when people say "It's safe to say" things it's not in fact safe to say.

NNoam Brown@polynoamial · Jul 19

I think it's safe to say this @OpenAI IMO gold result came as a bit of a surprise to folks

131

19.0K

Lewis Tunstall@_lewtun · Jul 19

Numpy in the wild

1.0K