Justus Mattern

@MatternJustus

Research Engineer @PrimeIntellect | prev. co-founder http://re.video (YC S23), research @MPI_IS, physics @RWTH

San Francisco, CA

Joined March 2021

409Following

4KFollowers

Pinned

Justus Mattern@MatternJustus · May 6

Whenever I feel whiny about problems I have I remind myself that as an SF tech worker I live in a city with 259 sunny days, make more money at 22 than I ever would have in Germany and am surrounded by people that share the same interests and ambitions as me like nowhere else

15.0K

Justus Mattern@MatternJustus · Jul 26

Noticed some curiosity about the specific score comparison between GSPO and GRPO. From our perspective, we’re more focused on scalability — can we achieve better performance by increasing compute (e.g., training with more steps, extending generation length, regularly updating…

CChujie Zheng@ChujieZheng · Jul 25

Proud to introduce Group Sequence Policy Optimization (GSPO), our stable, efficient, and performant RL algorithm that powers the large-scale RL training of the latest Qwen3 models (Instruct, Coder, Thinking) 🚀 📄 huggingface.co/papers/2507.18…

195

14.0K

Justus Mattern@MatternJustus · Jul 25

Ideogram 3.0 by @ideogram_ai is ridiculously underrated It’s crazy fast and quickly climbing the ranks as the best image models on @designarena_ai

DDesign Arena@designarena_ai · Jul 25

📣Model Drop: Ideogram 3.0 @ideogram_ai just dropped on DesignArena.ai Text-to-Image model

1.0K

Justus Mattern@MatternJustus · Jul 22

Just landed in SF, I'm now stranded here without a desk while the rest of my team is still in Europe. If anyone can host me at their office for (one of) the next few days, please let me know (DMs open 🥹👉👈)

11.0K

Justus Mattern@MatternJustus · Jul 22

While LLMs are good at generating functionally correct frontend code, it’s stunning how bad AI-generated UIs are; I’m certain that this can become better with appropriate evals and reward signals Really excited about this leaderboard and the very hard-working team behind it!

GGrace Li@grx_xce · Jul 22

Three weeks ago, we started building an AI game engine. But some models kept making things look... sloppy. So we turned finding the best one into a game. In three weeks, that game grew to 35K+ users across 135 countries. Introducing @designarena_ai, the fastest-growing…

3.0K

Justus Mattern@MatternJustus · Jul 21

How to train the best non-reasoning model: 1. Gather a reasoning dataset 2. remove <think> tokens 3. Train the model

1.0K

201

87.0K

Justus Mattern Retweeted

Minn@minney_cat · Jul 19

this year alone, I've met hundreds of the world's elite AI researchers + engineers steering the future of intelligence, and they ultimately want to do it here, in the US 🇺🇸 If a visa or green card are holding you back, join us on 7/31 in SF and hear real stories from…

138

17.0K

Justus Mattern@MatternJustus · Jul 17

RL with predefined tools does not matter in the long term, the most bitter lesson pilled approach is giving the model a single universal tool (a computer)

OOpenAI@OpenAI · Jul 17

ChatGPT agent’s capabilities are reflected in its state-of-the-art performance on academic and real-world task evaluations, like data modeling, spreadsheet editing, and investment banking.

310

39.0K

Justus Mattern Retweeted

Mario Sieg@_mario_neo_ · Jul 17

my piquant quantization kernels are almost 50 times faster than pytorch's on the CPU. pytorch’s sub‑byte quantization (torch.quint4x2, torch.quint2x4) is quite slow. 1/2

125

7.0K

Justus Mattern@MatternJustus · Jul 16

At #ICML2025 in Vancouver 🇨🇦 this week, presenting some work from my first year at Stanford! Come find me at posters or just around the conference! Thursday: KernelBench: Can LLMs Write Efficient GPU Kernels? 11AM East E-2010 Saturday: Kevin: Multi-Turn RL for Generating…

AAzalia Mirhoseini@Azaliamirh · Jul 16

Looking forward to attending ICML! Here are some works on memory/long context, verification, kernel design, multi-model AI systems, and theoretical understanding of test-time scaling from my awesome students and collaborators!

15.0K

Justus Mattern Retweeted

Jackmin@jackminong · Jul 15

Toploc Poster session tomorrow (Wed) at 4:30 PM East Hall E-1106 I’ll be around through Saturday; if you’re into decentralized training & inference, lets chat!

7.0K

Justus Mattern@MatternJustus · Jul 15

Day 1 of asking for this at @PrimeIntellect HQ

jjordwalke@jordwalke · Jul 15

Professional jiu jitsu mats at @Replit HQ in Foster City. Let’s go!

118

9.0K

Justus Mattern@MatternJustus · Jul 14

I have acquired Windsurf

5.0K

Justus Mattern Retweeted

Design Arena@designarena_ai · Jul 12

Kimi K2 by @Kimi_Moonshot and Mistral Small 3.2 by @MistralAI just added to leaderboard Crown your winner at DesignArena.ai

4.0K

Justus Mattern@MatternJustus · Jul 10

SYNTHETIC-2 Datasets are now on Huggingface! We’re releasing an SFT dataset collected from the new R1-0528 as well as an RL Dataset with difficulty annotations from various smaller models. Go train some models 🫡

PPrime Intellect@PrimeIntellect · Jul 10

Releasing SYNTHETIC-2: our open dataset of 4m verified reasoning traces spanning a comprehensive set of complex RL tasks and verifiers. Created by hundreds of compute contributors across the globe via our pipeline parallel decentralized inference stack. primeintellect.ai/blog/synthetic…

7.0K

Justus Mattern@MatternJustus · Jul 5

Highest leverage thing unskilled engineers can do rn to contribute to frontier AI research is vibecoding RL environments

628

403

79.0K

Justus Mattern@MatternJustus · Jul 3

First tweet is a banger, I will watch his career with great interest

AAlex Fang@chaotic_fang · Jul 3

When you're looking through your training samples

2.0K

Justus Mattern@MatternJustus · Jul 1

We are so back btw (got submitted in final but hey)

JJustus Mattern@MatternJustus · Feb 25

finally back to grappling regularly after 6+ months of forced break (due to injury) and I only realize now how much not doing it affected my mental health

150

22.0K

Justus Mattern@MatternJustus · Jun 30

great thread, summarizes well why we're particularly excited about RL higher inference to training compute ratio -> less inter node communication -> better suited for globally distributed training infra with slow connection speeds

KKevin Lu@_kevinlu · Jun 30

So I think something else that doesn't get discussed much is the extrapolation of this inference : training trend - 2015: back in the day, we would train one model per dataset, and inference it once (to obtain the eval result for our paper) - 2020: with chatgpt, multi-task…

2.0K