Benjamin Warner

@benjamin_warner

Vaccines save lives.

answer.ai

Joined September 2011

442Following

2KFollowers

Pinned

Benjamin Warner@benjamin_warner · Dec 19

Today we released ModernBERT, the first encoder to reach SOTA on most common benchmarks across language understanding, retrieval, and code, while running twice as fast as DeBERTaV3 on short context and three times faster than NomicBERT & GTE on long context.

benjamin_warner's tweet image. Today we released ModernBERT, the first encoder to reach SOTA on most common benchmarks across language understanding, retrieval, and code, while running twice as fast as DeBERTaV3 on short context and three times faster than NomicBERT &amp; GTE on long context.

10.0K

Pinned

Benjamin Warner@benjamin_warner · Jul 3

hey guys did you know SWEBench is like ~70% one single repository and that one repository is Django

TTogether AI@togethercompute · Jul 2

Announcing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. Built in…

1.0K

263

201.0K

Benjamin Warner Retweeted

Andrew White 🐦‍⬛@andrewwhite01 · Jul 23

HLE has recently become the benchmark to beat for frontier agents. We @FutureHouseSF took a closer look at the chem and bio questions and found about 30% of them are likely invalid based on our analysis and third-party PhD evaluations. 1/7

574

169

114.0K

Benjamin Warner Retweeted

elie@eliebakouch · Jul 21

We've just release 100+ intermediate checkpoints and our training logs from SmolLM3-3B training. We hope this can be useful to the researcher working on mech interpret, training dynamics, RL and other topics :) Training logs: -> Usual training loss (the gap in the loss are due…

392

192

31.0K

Benjamin Warner Retweeted

elie@eliebakouch · Jul 21

Kimi K2 tech report is full of gems as always. Here are my notes on it: > MuonClip: Pretty crazy how after 70k the training stabilizes and the QK-clip is basically inactive. There is also no loss in perf with QK-clip which is not trivial at all (at small scale but with…

338

278

24.0K

Benjamin Warner Retweeted

Sebastian Raschka@rasbt · Jul 19

From GPT to MoE: I reviewed & compared the main LLMs of 2025 in terms of their architectural design from DeepSeek-V3 to Kimi 2. Multi-head Latent Attention, sliding window attention, new Post- & Pre-Norm placements, NoPE, shared-expert MoEs, and more... magazine.sebastianraschka.com/p/the-big-llm-…

375

2.0K

124.0K

Benjamin Warner@benjamin_warner · Jul 18

Reports of AI eating entry level jobs are greatly exaggerated. My guess is current and near-future LLMs are more likely to increase the demand for programmers, not decrease demand (Jevons Paradox).

JJohn Burn-Murdoch@jburnmurdoch · Jul 18

But, plot twist: The much-discussed contraction in entry-level tech hiring appears to have *reversed* in recent months. In fact, relative to the pre-generative AI era, recent grads have secured coding jobs at the same rate as they’ve found any job, if not slightly higher.

602

Benjamin Warner Retweeted

Zirui Wu @ACL2025 🇦🇹@WilliamZR7 · Jul 15

We present DreamOn: a simple yet effective method for variable-length generation in diffusion language models. Our approach boosts code infilling performance significantly and even catches up with oracle results.

113

14.0K

Benjamin Warner Retweeted

Orion Weller@orionweller · Jul 16

🤔 Have you ever wondered how good ModernBERT is compared to decoders like Llama? We made an open-data version of ModernBERT and used the same recipe for encoders and decoders. Turns out, our encoder model beat ModernBERT and our decoder model beats Llama 3.2 / SmolLM2 🤯 🧵

213

116

25.0K

Benjamin Warner@benjamin_warner · Jul 10

I replicated this result, that Grok focuses nearly entirely on finding out what Elon thinks in order to align with that, on a fresh Grok 4 chat with no custom instructions. grok.com/share/c2hhcmQt…

RRamez Naam@ramez · Jul 10

Grok 4 decides what it thinks about Israel/Palestine by searching for Elon's thoughts. Not a confidence booster in "maximally truth seeking" behavior. h/t @catehall. Screenshots are mine.

194

745

5.0K

1.0K

1.7M

Benjamin Warner@benjamin_warner · Jul 10

The best part about grok 4 is not knowing if you'll get a SOTA or MechaHitler response to any given prompt

253

Benjamin Warner Retweeted

Horace He@cHHillee · Jul 10

Pretty cool, especially for long sequences! I will note that you can pretty easily get much better numbers for torch.compile that are much closer for sequences up to about 16384. A couple things: 1. By default torch.compile generates dynamic-shapes kernels when benchmarked…

171

18.0K

Benjamin Warner@benjamin_warner · Jul 10

Getting mem-bound kernels to speed-of-light isn't a dark art, it's just about getting the a couple of details right. We wrote a tutorial on how to do this, with code you can directly use. Thanks to the new CuTe-DSL, we can hit speed-of-light without a single line of CUDA C++.

WWentao Guo@WentaoGuo7 · Jul 10

🦆🚀QuACK🦆🚀: new SOL mem-bound kernel library without a single line of CUDA C++ all straight in Python thanks to CuTe-DSL. On H100 with 3TB/s, it performs 33%-50% faster than highly optimized libraries like PyTorch's torch.compile and Liger. 🤯 With @tedzadouri and @tri_dao

520

360

47.0K

Benjamin Warner Retweeted

elie@eliebakouch · Jul 8

Super excited to share SmolLM3, a new strong 3B model. SmolLM3 is fully open, we share the recipe, the dataset, the training codebase and much more! > Train on 11T token on 384 H100 for 220k GPU hours > Support long context up to 128k thanks to NoPE and intra document masking >…

137

834

474

114.0K

Benjamin Warner@benjamin_warner · Jun 26

So about a month ago, Percy posted a version of this plot of our Marin 32B pretraining run. We got a lot of feedback, both public and private, that the spikes were bad. (This is a thread about how we fixed the spikes. Bear with me. )

PPercy Liang@percyliang · May 22

Marin 32B training crossed 1.5 trillion tokens today...

103

1.0K

294.0K

Benjamin Warner Retweeted

Weaviate • vector database@weaviate_io · Jun 25

Multi-vector search is beating agentic RAG by miles. But they come at a cost. Or do they? Recently, new multi-vector models like Reason-ModernColBERT have outperformed popular reasoning-based retrieval models like ReasonIR-8B. The embedding-per-token approach of ColBERT style…

256

276

17.0K

Benjamin Warner Retweeted

Isaac Flath@isaac_flath · Jun 24

I just left my job to work on my own business. It’s been excellent working at AAI. They have a really amazing vision but I decided it was time to follow my own vision, passions, projects, and products. I am still collaborating with AAI on some stuff though :D

5.0K

Benjamin Warner Retweeted

Simon Willison@simonw · Jun 16

If you use "AI agents" (LLMs that call tools) you need to be aware of the Lethal Trifecta Any time you combine access to private data with exposure to untrusted content and the ability to externally communicate an attacker can trick the system into stealing your data!

536

2.0K

598.0K