Justin Deschenaux (@jdeschena)

Pinned

J

🌟 Excited to share our latest work on making diffusion language models (DLMs) faster than autoregressive (AR) models! ⚡ It’s been great to work on this with @caglarml 😎 Lately, DLMs are gaining traction as a promising alternative to autoregressive sequence modeling 👀 1/14 🧵

2

59

266

148

64.0K

Justin Deschenaux Retweeted

T

Tanishq Abraham back from ICML@iScienceLuvr · Jul 22

Inverse Scaling in Test-Time Compute "We identify five distinct failure modes when models reason for longer: 1) Claude models become increasingly distracted by irrelevant information; 2) OpenAI o-series models resist distractors but overfit to problem framings; 3) models shift…

11

33

180

111

14.0K

J

Justin Deschenaux@jdeschena · Jul 16

If you’re interested in long-context efficiency, don’t miss our recent paper RAT—a joint effort with @anunay_yadav, Razvan Pascanu, @caglarml. While many efforts are already on LA or SSMs, we explore a different way to inject recurrence into softmax-based attention by chunking!…

CCaglar Gulcehre@caglarml · Jul 13

Many people still talk about coming up with alternatives to self-attention, but acknowledging the strengths of both self-attention and SSMs. We explored various LLM designs to bridge the gap and achieve the best of both worlds. 🚀 We introduce RAT 🐀—our hybrid…

0

3

11

1

841

J

Justin Deschenaux@jdeschena · Jul 13

Many people still talk about coming up with alternatives to self-attention, but acknowledging the strengths of both self-attention and SSMs. We explored various LLM designs to bridge the gap and achieve the best of both worlds. 🚀 We introduce RAT 🐀—our hybrid…

XXiuying Wei@XiuyingWei966 · Jul 11

Curious about making Transformers faster on long sequences without compromising accuracy? ⚡️🧠 Meet RAT—an intermediate design between RNN and softmax attention. The results? Faster and lighter like RNNs, with strong performance like Attention! 🐭✨

0

3

24

13

4.0K

J

Justin Deschenaux@jdeschena · Jun 14

🔥 NEW PAPER: "The Diffusion Duality" Uniform-state diffusion models for text generation emerges from an underlying continuous Gaussian diffusion process! This connection lets us transfer powerful techniques between continuous and discrete domains 👀 1/9🧵x.com/ssahoo_/status…

SSubham Sahoo@ssahoo_ · Jun 13

🚨 “The Diffusion Duality” is out! @ICML2025 ⚡️ Few-step generation in discrete diffusion language models by exploiting the underlying Gaussian diffusion. 🦾Beats AR on 3/7 zero-shot likelihood benchmarks. 📄 Paper: arxiv.org/abs/2506.10892 💻 Code: github.com/s-sahoo/duo 🧠…

1

7

21

9

1.0K

Justin Deschenaux Retweeted

O

Otmar Hilliges@OHilliges · Jul 14

Sadly, I am no longer a professor at ETH (@eth_en) due to very severe #longCovid and #MECFS. ethrat.ch/de/ernennungen….

30

144

607

64

31.0K

J

Justin Deschenaux@jdeschena · Jul 14

Attending ICML ✈️Tues-Fri to present "The Diffusion Duality" 🗓️Wed, July 16 @ 4:30pm 📍East Exhibition Hall A-B (E-3003) DM if you want to chat about diffusion LMs, or my current work on Duality or Esoteric LMs! x.com/ssahoo_/status…

SSubham Sahoo@ssahoo_ · Jun 13

🚨 “The Diffusion Duality” is out! @ICML2025 ⚡️ Few-step generation in discrete diffusion language models by exploiting the underlying Gaussian diffusion. 🦾Beats AR on 3/7 zero-shot likelihood benchmarks. 📄 Paper: arxiv.org/abs/2506.10892 💻 Code: github.com/s-sahoo/duo 🧠…

1

17

157

52

10.0K

Justin Deschenaux Retweeted

S

Skander Moalla@SkanderMoalla · Jul 14

🚀 Big time! We can finally do LLM RL fine-tuning with rewards and leverage offline/off-policy data! ❌ You want rewards, but GRPO only works online? ❌ You want offline, but DPO is limited to preferences? ✅ QRPO can do both! 🧵Here's how we do it:

3

35

137

153

19.0K

Justin Deschenaux Retweeted

X

Xiuying Wei@XiuyingWei966 · Jul 11

Curious about making Transformers faster on long sequences without compromising accuracy? ⚡️🧠 Meet RAT—an intermediate design between RNN and softmax attention. The results? Faster and lighter like RNNs, with strong performance like Attention! 🐭✨

2

9

22

7

5.0K

Justin Deschenaux Retweeted

J

Jonathan Whitaker@johnowhitaker · Jun 19

I did another video, on the paper 'The Diffusion Duality', continuing the series of me trying to understand diffusion applied to language models :) Link: youtube.com/watch?v=o_ISAl… I shied away from some of the scarier math - hope my hand-waving is still vaguely useful + correct!

1

32

241

203

28.0K

J

Justin Deschenaux@jdeschena · Jun 16

This work uncovers a profound connection between continuous and discrete (non-absorbing) diffusion models, allowing transfer of advanced techniques such as consistency distillation to the discrete setting! Also: amazing title, no notes! 🧑‍🍳😙🤌

SSubham Sahoo@ssahoo_ · Jun 13

🚨 “The Diffusion Duality” is out! @ICML2025 ⚡️ Few-step generation in discrete diffusion language models by exploiting the underlying Gaussian diffusion. 🦾Beats AR on 3/7 zero-shot likelihood benchmarks. 📄 Paper: arxiv.org/abs/2506.10892 💻 Code: github.com/s-sahoo/duo 🧠…

5

32

258

157

23.0K

Justin Deschenaux Retweeted

A

AK@_akhaliq · Jun 16

The Diffusion Duality unlock few-step generation in discrete diffusion language models via the underlying Gaussian diffusion

6

47

264

108

23.0K

J

Justin Deschenaux@jdeschena · Jun 14

Check out our recent paper on the "duality" between discrete and Gaussian diffusion. We show how you can exploit that relationship to massively speed up discrete diffusion by two orders of magnitude.

SSubham Sahoo@ssahoo_ · Jun 13

🚨 “The Diffusion Duality” is out! @ICML2025 ⚡️ Few-step generation in discrete diffusion language models by exploiting the underlying Gaussian diffusion. 🦾Beats AR on 3/7 zero-shot likelihood benchmarks. 📄 Paper: arxiv.org/abs/2506.10892 💻 Code: github.com/s-sahoo/duo 🧠…

1

5

19

5

2.0K