Jiacheng Ye

@JiachengYe15

Ph.D. student of @HKUniversity.

Hong Kong

Joined October 2022

305Following

1KFollowers

Pinned

Jiacheng Ye@JiachengYe15 · Apr 2

🚀Excited to announce Dream 7B (Diffusion reasoning model): the most powerful open diffusion large language model to date.

JiachengYe15's tweet image. 🚀Excited to announce Dream 7B (Diffusion reasoning model): the most powerful open diffusion large language model to date.

206

1.0K

774

263.0K

Jiacheng Ye@JiachengYe15 · 23 h

Excited to bring Qwen3-Coder into the browser and terminal world! Building the scaffolding and environments for this big guy to play and learn is tough but incredibly "rewarding". Agentic coding and browsing are arguably the two most important skills for digital agents: they…

QQwen@Alibaba_Qwen · Jul 22

>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…

100

7.0K

Jiacheng Ye@JiachengYe15 · Jul 22

After three intense months of hard work with the team, we made it! We hope this release can help drive the progress of Coding Agents. Looking forward to seeing Qwen3-Coder continue creating new possibilities across the digital world!

QQwen@Alibaba_Qwen · Jul 22

912

101

56.0K

Jiacheng Ye Retweeted

Zirui Wu@WilliamZR7 · Jul 15

We present DreamOn: a simple yet effective method for variable-length generation in diffusion language models. Our approach boosts code infilling performance significantly and even catches up with oracle results.

113

14.0K

Jiacheng Ye Retweeted

Zhihui Xie@_zhihuixie · Jul 15

🚀 Thrilled to announce Dream-Coder 7B — the most powerful open diffusion code  LLM to date.

109

11.0K

Jiacheng Ye Retweeted

Sansa Gong@sansa19739319 · Jul 2

🤖Can diffusion models write code competitively? Excited to share our latest 7B coding diffusion LLM!!💻 With DiffuCoder, we explore how they decode, why temperature🔥 matters, and how to improve them via coupled-GRPO that speaks diffusion!!📈 Code: github.com/apple/ml-diffu… 🧵

112

581

378

45.0K

Jiacheng Ye Retweeted

Chenxin An@AnChancy46881 · Jun 20

# 🚨 4B open-recipe model beats Claude-4-Opus 🔓 100% open data, recipe, model weights and code. Introducing Polaris✨--a post-training recipe for scaling RL on advanced reasoning models. 🥳 Check out how we boost open-recipe reasoning models to incredible performance levels…

447

399

96.0K

Jiacheng Ye Retweeted

Ricky T. Q. Chen@RickyTQChen · Jun 11

Padding in our non-AR sequence models? Yuck. 🙅 👉 Instead of unmasking, our new work *Edit Flows* perform iterative refinements via position-relative inserts and deletes, operations naturally suited for variable-length sequence generation. Easily better than using mask tokens.

516

343

39.0K

Jiacheng Ye Retweeted

Lei Li@_TobiasLee · Jun 5

MiMo-VL technical report, models, and evaluation suite are out! 🤗 Models: huggingface.co/XiaomiMiMo/MiM… (or RL) Report: arxiv.org/abs/2506.03569 Evaluation Suite: github.com/XiaomiMiMo/lmm… Looking back, it's incredible that we delivered such compact yet powerful vision-language…

4.0K

Jiacheng Ye Retweeted

Subham Sahoo@ssahoo_ · Jun 3

🚨 [New paper alert] Esoteric Language Models (Eso-LMs) First Diffusion LM to support KV caching w/o compromising parallel generation. 🔥 Sets new SOTA on the sampling speed–quality Pareto frontier 🔥 🚀 65× faster than MDLM ⚡ 4× faster than Block Diffusion 📜 Paper:…

253

180

89.0K

Jiacheng Ye@JiachengYe15 · Jun 4

🚀The code for Fast-dLLM is now open-source! 💥 Fast-dLLM achieves a 27.6× end-to-end speedup on 1024-token sequences with less than 2% accuracy drop. Check out the code here: github.com/NVlabs/Fast-dL…

EEnze Xie@xieenze_jr · May 29

🚀 Fast-dLLM: 27.6× Faster Diffusion LLMs with KV Cache & Parallel Decoding 💥 Key Features🌟 - Block-Wise KV Cache Reuses 90%+ attention activations via bidirectional caching (prefix/suffix), enabling 8.1×–27.6× throughput gains with <2% accuracy loss 🔄 -…

3.0K

Jiacheng Ye Retweeted

Xueliang Zhao@xlzhao_hku · May 29

🔥 Meet PromptCoT-Mamba The first reasoning model with constant-memory inference to beat Transformers on competition-level math & code ⚡ Efficient decoding: no attention, no KV cache ⚡ +16.0% / +7.1% / +16.6% vs. s1.1-7B on AIME 24 / 25 / LiveCodeBench 🚀 Up to 3.66× faster

2.0K

Jiacheng Ye Retweeted

Runpeng Yu@y30897 · May 23

🚀 We’re excited to release Dimple — one of the first Discrete Diffusion MLLMs for visual understanding! 🔥 Surpasses LLaVA-NEXT ⚡ Parallel decoding & ultra-fast inference (sometimes faster than AR) 🎯 Precise response control

1.0K

Jiacheng Ye@JiachengYe15 · May 21

This is so cool! Glad to see the advance of dLLMs!

GGoogle DeepMind@GoogleDeepMind · May 20

We’ve developed Gemini Diffusion: our state-of-the-art text diffusion model. Instead of predicting text directly, it learns to generate outputs by refining noise, step-by-step. This helps it excel at coding and math, where it can iterate over solutions quickly. #GoogleIO

618

Jiacheng Ye Retweeted

Siyan Zhao@siyan_zhao · Apr 11

Introducing d1🚀 — the first framework that applies reinforcement learning to improve reasoning in masked diffusion LLMs (dLLMs). Combining masked SFT with a novel form of policy gradient algorithm, d1 significantly boosts the performance of pretrained dLLMs like LLaDA.

107

569

379

82.0K

Jiacheng Ye Retweeted

Xueliang Zhao@xlzhao_hku · Apr 10

🚀 Meet PromptCoT-QwQ-32B, a breakthrough in mathematical reasoning! Outperforming all open-source models on AIME2024 and AIME2025, including Nemotron-Ultra-253B, DeepSeek-R1-671B, and QwQ-32B! 🔥

3.0K

Jiacheng Ye Retweeted

Bowen Wang@BowenWangNLP · Apr 8

🎮 Computer Use Agent Arena is LIVE! 🚀 🔥 Easiest way to test computer-use agents in the wild without any setup 🌟 Compare top VLMs: OpenAI Operator, Claude 3.7, Gemini 2.5 Pro, Qwen 2.5 vl and more 🕹️ Test agents on 100+ real apps & webs with one-click config 🔒 Safe & free…

105

335

210

88.0K

Jiacheng Ye@JiachengYe15 · Apr 8

Apolinario built a nice demo for Dream 7B, have a try!

aapolinario 🌐@multimodalart · Apr 7

The Dream 7B (diffusion reasoning language model) is OUT! 🚨 I built a demo so you can test it out (and check the diffusion process live) 𖣯🔍

2.0K