Sulin Liu

@su_lin_liu

Postdoc @MIT Ex: Machine Learning PhD @Princeton @Meta @NTUsg @NUSingapore

Joined March 2011

1KFollowing

617Followers

Pinned

Sulin Liu@su_lin_liu · Oct 16

Discrete generative models use denoisers for generation, but they can slip up. What if generation *isn’t only* about denoising?🤔 Introducing DDPD: Discrete Diffusion with Planned Denoising🤗🧵(1/11) w/ @junonam_ @AndrewC_ML @HannesStaerk @xuyilun2 Tommi Jaakkola @RGBLabMIT

su_lin_liu's tweet image. Discrete generative models use denoisers for generation, but they can slip up. What if generation *isn’t only* about denoising?🤔

Introducing DDPD: Discrete Diffusion with Planned Denoising🤗🧵(1/11)

w/ @junonam_ @AndrewC_ML @HannesStaerk @xuyilun2 Tommi Jaakkola @RGBLabMIT

236

162

39.0K

Sulin Liu Retweeted

Lin Zheng@linzhengisme · Jan 22

🚀 Meet EvaByte: The best open-source tokenizer-free language model! Our 6.5B byte LM matches modern tokenizer-based LMs with 5x less data & 2x faster decoding, naturally extending to multimodal tasks while fixing tokenization quirks. 💻 Blog: bit.ly/3CjEmTC 🧵 1/9

477

318

53.0K

Sulin Liu Retweeted

Kenny Peng@kennylpeng · Apr 2

Our lab had a #dogathon 🐕 yesterday where we analyzed NYC Open Data on dog licenses. We learned a lot of dog facts, which I’ll share in this thread 🧵 1) Geospatial trends: Cavalier King Charles Spaniels are common in Manhattan; the opposite is true for Yorkshire Terriers.

3.0K

Sulin Liu@su_lin_liu · Mar 12

which train are you on?🚄🚇🚆 (also me: we need faster trains in the states 😶)

803

Sulin Liu Retweeted

Simo Ryu@cloneofsimo · Mar 6

LLaDA with muP. it just works, again. Im so tired of saying it works. Just use it, and thank me later

397

258

48.0K

Sulin Liu Retweeted

Federico Cassano@ellev3n11 · Feb 27

i think that all the pre-training is dead takes are bad. the issue with these big big models is that they are capped by dogwater human-labeled post-training data. we shall continue to scale by exploiting verified RL. excited to see gpt-4.5 be used as a base for the next o model.

1.0K

Sulin Liu Retweeted

David Duvenaud@DavidDuvenaud · Feb 27

LLMs have complex joint beliefs about all sorts of quantities. And my postdoc @jamesrequeima visualized them! In this thread we show LLM predictive distributions conditioned on data and free-form text. LLMs pick up on all kinds of subtle and unusual structure: 🧵

203

2.0K

1.0K

192.0K

Sulin Liu@su_lin_liu · Feb 26

Excited to share that I’ve been working on scaling up diffusion language models at Inception. A new generation of LLMs with unprecedented capabilities is coming!

IInception@InceptionAILabs · Feb 26

We are excited to introduce Mercury, the first commercial-grade diffusion large language model (dLLM)! dLLMs push the frontier of intelligence and speed with parallel, coarse-to-fine text generation.

694

120

49.0K

Sulin Liu Retweeted

Sulin Liu@su_lin_liu · Feb 21

grok also tends to do more solution verification at the end of the solution than chatgpt. Clearly this cannot be baked in through just RL from verifiable reward...

204

Sulin Liu@su_lin_liu · Feb 17

Discrete diffusion (including masked language model) deserves more investment in terms of research and compute, especially when we are running out of pre-training data for autoregressive LLMs. You can get a lot more data for free by just masking data or perturbing them with…

SSimo Ryu@cloneofsimo · Feb 17

This is really insane. They took all the bet and scaled up discrete diffusion model to llama-7B scale. IIRC nobody dared to do this at this scale but these madlads done it. They even fine-tuned it to be a dialogue model. This is really frontier-level shit that is genuinely new…

2.0K

Sulin Liu Retweeted

Ji-Ha@Ji_Ha_Kim · Feb 17

I can’t begin to imagine how strong Anthropic’s internal models must be, since Claude was by far the strongest of the standard non-reasoning models: it’s the only one who could escape getting stuck in loops, a recurrent problem that every other LLM has not overcome

5.0K

Sulin Liu Retweeted

Sitan Chen@sitanch · Feb 11

Excited about this new work where we dig into the role of token order in masked diffusions! MDMs train on some horribly hard tasks, but careful planning at inference can sidestep the hardest ones, dramatically improving over vanilla MDM sampling (e.g. 7%->90% acc on Sudoku) 1/

154

100

35.0K

Sulin Liu@su_lin_liu · Feb 10

Check out new paper on how to do planning for discrete diffusion 👏 Really exciting to see more exploration in this direction🔥

FFred Zhangzhi Peng@pengzhangzhi1 · Feb 10

New Paper Alert! 🚀 We introduce Path Planning (P2), a sampling approach to optimizing token unmasking order in Masked Diffusion Models (MDMs). SOTA results across language, math, code, and biological sequence (Protein and RNA)—all without training. arxiv.org/pdf/2502.03540 🧵👇

587

Sulin Liu@su_lin_liu · Jan 23

This quirky topic summarization (edge case?) somehow made my day😂

264