Justus Mattern
@MatternJustus
Research Engineer @PrimeIntellect | prev. co-founder http://re.video (YC S23), research @MPI_IS, physics @RWTH
Whenever I feel whiny about problems I have I remind myself that as an SF tech worker I live in a city with 259 sunny days, make more money at 22 than I ever would have in Germany and am surrounded by people that share the same interests and ambitions as me like nowhere else
Noticed some curiosity about the specific score comparison between GSPO and GRPO. From our perspective, we’re more focused on scalability — can we achieve better performance by increasing compute (e.g., training with more steps, extending generation length, regularly updating…
Proud to introduce Group Sequence Policy Optimization (GSPO), our stable, efficient, and performant RL algorithm that powers the large-scale RL training of the latest Qwen3 models (Instruct, Coder, Thinking) 🚀 📄 huggingface.co/papers/2507.18…
Ideogram 3.0 by @ideogram_ai is ridiculously underrated It’s crazy fast and quickly climbing the ranks as the best image models on @designarena_ai
📣Model Drop: Ideogram 3.0 @ideogram_ai just dropped on DesignArena.ai Text-to-Image model
Just landed in SF, I'm now stranded here without a desk while the rest of my team is still in Europe. If anyone can host me at their office for (one of) the next few days, please let me know (DMs open 🥹👉👈)
While LLMs are good at generating functionally correct frontend code, it’s stunning how bad AI-generated UIs are; I’m certain that this can become better with appropriate evals and reward signals Really excited about this leaderboard and the very hard-working team behind it!
Three weeks ago, we started building an AI game engine. But some models kept making things look... sloppy. So we turned finding the best one into a game. In three weeks, that game grew to 35K+ users across 135 countries. Introducing @designarena_ai, the fastest-growing…
How to train the best non-reasoning model: 1. Gather a reasoning dataset 2. remove <think> tokens 3. Train the model
this year alone, I've met hundreds of the world's elite AI researchers + engineers steering the future of intelligence, and they ultimately want to do it here, in the US 🇺🇸 If a visa or green card are holding you back, join us on 7/31 in SF and hear real stories from…
RL with predefined tools does not matter in the long term, the most bitter lesson pilled approach is giving the model a single universal tool (a computer)
ChatGPT agent’s capabilities are reflected in its state-of-the-art performance on academic and real-world task evaluations, like data modeling, spreadsheet editing, and investment banking.
my piquant quantization kernels are almost 50 times faster than pytorch's on the CPU. pytorch’s sub‑byte quantization (torch.quint4x2, torch.quint2x4) is quite slow. 1/2
At #ICML2025 in Vancouver 🇨🇦 this week, presenting some work from my first year at Stanford! Come find me at posters or just around the conference! Thursday: KernelBench: Can LLMs Write Efficient GPU Kernels? 11AM East E-2010 Saturday: Kevin: Multi-Turn RL for Generating…
Looking forward to attending ICML! Here are some works on memory/long context, verification, kernel design, multi-model AI systems, and theoretical understanding of test-time scaling from my awesome students and collaborators!
Toploc Poster session tomorrow (Wed) at 4:30 PM East Hall E-1106 I’ll be around through Saturday; if you’re into decentralized training & inference, lets chat!
Day 1 of asking for this at @PrimeIntellect HQ
Professional jiu jitsu mats at @Replit HQ in Foster City. Let’s go!
Kimi K2 by @Kimi_Moonshot and Mistral Small 3.2 by @MistralAI just added to leaderboard Crown your winner at DesignArena.ai
SYNTHETIC-2 Datasets are now on Huggingface! We’re releasing an SFT dataset collected from the new R1-0528 as well as an RL Dataset with difficulty annotations from various smaller models. Go train some models 🫡
Releasing SYNTHETIC-2: our open dataset of 4m verified reasoning traces spanning a comprehensive set of complex RL tasks and verifiers. Created by hundreds of compute contributors across the globe via our pipeline parallel decentralized inference stack. primeintellect.ai/blog/synthetic…
Highest leverage thing unskilled engineers can do rn to contribute to frontier AI research is vibecoding RL environments
First tweet is a banger, I will watch his career with great interest
When you're looking through your training samples
We are so back btw (got submitted in final but hey)
finally back to grappling regularly after 6+ months of forced break (due to injury) and I only realize now how much not doing it affected my mental health
great thread, summarizes well why we're particularly excited about RL higher inference to training compute ratio -> less inter node communication -> better suited for globally distributed training infra with slow connection speeds
So I think something else that doesn't get discussed much is the extrapolation of this inference : training trend - 2015: back in the day, we would train one model per dataset, and inference it once (to obtain the eval result for our paper) - 2020: with chatgpt, multi-task…