Dimitri von Rütte
@dvruette
PhD @ETH_en, prev. Machine Learning @DeepJudgeAI
🚨 NEW PAPER DROP! Wouldn't it be nice if LLMs could spot and correct their own mistakes? And what if we could do so directly from pre-training, without any SFT or RL? We present a new class of discrete diffusion models, called GIDD, that are able to do just that: 🧵1/12
📢 Duo and Eso-LMs at 2B scale on Slim Pajama These models will finish training in a few days. While HF release may take time due to corporate red tape, we'll try providing early access case-by-case. Email [email protected] with the subject “Early access”. Duo:…
Great energy today at the ICML diffusion meetup! was also very happy to find a decent number of discrete diffusion people mixed in the crowd!
We are sitting all the way at the back of the conference center (west building)!
Hello #ICML2025👋, anyone up for a diffusion circle? We'll just sit down somewhere and talk shop. 🕒Join us at 3PM on Thursday July 17. We'll meet here (see photo, near the west building's west entrance), and venture out from there to find a good spot to sit. Tell your friends!
Very cool initiative! Excited to see how far we can push agents on this very challenging task (especially track 2..)
🚀 Launch day! The NeurIPS 2025 PokéAgent Challenge is live. Two tracks: ① Showdown Battling – imperfect-info, turn-based strategy ② Pokemon Emerald Speedrunning – long horizon RPG planning 5 M labeled replays • starter kit • baselines. Bring your LLM, RL, or hybrid…
Mechanistic interpretability often relies on *interventions* to study how DNNs work. Are these interventions enough to guarantee the features we find are not spurious? No!⚠️ In our new paper, we show many mech int methods implicitly rely on the linear representation hypothesis🧵
The bitter lesson strikes again 😬 Less inductive scales better for molecule generation as well! Forcing the model to be rotation-equivariant is like putting it on training wheels: Initial learning is made easier, but we can only achieve peak performance by taking them off.
Is equivariance necessary for a good 3D molecule generative model? Check out our #icml2025 paper, which closes the performance gap between non-equivariant and equivariant diffusion models via rotational alignment, while also being more efficient (1/7): arxiv.org/abs/2506.10186