Dimitri von Rütte

@dvruette

PhD @ETH_en, prev. Machine Learning @DeepJudgeAI

Joined January 2023

291Following

1KFollowers

Pinned

🚨 NEW PAPER DROP! Wouldn't it be nice if LLMs could spot and correct their own mistakes? And what if we could do so directly from pre-training, without any SFT or RL? We present a new class of discrete diffusion models, called GIDD, that are able to do just that: 🧵1/12

161

1.0K

928

138.0K

Dimitri von Rütte Retweeted

Subham Sahoo@ssahoo_ · Jul 22

📢 Duo and Eso-LMs at 2B scale on Slim Pajama These models will finish training in a few days. While HF release may take time due to corporate red tape, we'll try providing early access case-by-case. Email [email protected] with the subject “Early access”. Duo:…

880

Dimitri von Rütte@dvruette · Jul 18

Great energy today at the ICML diffusion meetup! was also very happy to find a decent number of discrete diffusion people mixed in the crowd!

SSander Dieleman@sedielem · Jul 17

We are sitting all the way at the back of the conference center (west building)!

750

Dimitri von Rütte Retweeted

Sander Dieleman@sedielem · Jul 15

Hello #ICML2025👋, anyone up for a diffusion circle? We'll just sit down somewhere and talk shop. 🕒Join us at 3PM on Thursday July 17. We'll meet here (see photo, near the west building's west entrance), and venture out from there to find a good spot to sit. Tell your friends!

111

17.0K

Dimitri von Rütte@dvruette · Jul 14

Very cool initiative! Excited to see how far we can push agents on this very challenging task (especially track 2..)

SSeth Karten@sethkarten · Jul 14

🚀 Launch day! The NeurIPS 2025 PokéAgent Challenge is live. Two tracks: ① Showdown Battling – imperfect-info, turn-based strategy ② Pokemon Emerald Speedrunning – long horizon RPG planning 5 M labeled replays • starter kit • baselines. Bring your LLM, RL, or hybrid…

404

Dimitri von Rütte Retweeted

Tiago Pimentel@tpimentelms · Jul 14

Mechanistic interpretability often relies on *interventions* to study how DNNs work. Are these interventions enough to guarantee the features we find are not spurious? No!⚠️ In our new paper, we show many mech int methods implicitly rely on the linear representation hypothesis🧵

204

175

16.0K

Dimitri von Rütte@dvruette · Jul 2

The bitter lesson strikes again 😬 Less inductive scales better for molecule generation as well! Forcing the model to be rotation-equivariant is like putting it on training wheels: Initial learning is made easier, but we can only achieve peak performance by taking them off.

YYuhui Ding@yuhui_ding · Jul 2

Is equivariance necessary for a good 3D molecule generative model? Check out our #icml2025 paper, which closes the performance gap between non-equivariant and equivariant diffusion models via rotational alignment, while also being more efficient (1/7): arxiv.org/abs/2506.10186

629