Lin Zheng (@linzhengisme)

Pinned

L

Lin Zheng@linzhengisme · Jan 22

🚀 Meet EvaByte: The best open-source tokenizer-free language model! Our 6.5B byte LM matches modern tokenizer-based LMs with 5x less data & 2x faster decoding, naturally extending to multimodal tasks while fixing tokenization quirks. 💻 Blog: bit.ly/3CjEmTC 🧵 1/9

linzhengisme's tweet image. 🚀 Meet EvaByte: The best open-source tokenizer-free language model! Our 6.5B byte LM matches modern tokenizer-based LMs with 5x less data &amp; 2x faster decoding, naturally extending to multimodal tasks while fixing tokenization quirks.

💻 Blog: bit.ly/3CjEmTC

🧵 1/9

13

95

477

318

53.0K

Lin Zheng Retweeted

Q

Qian Liu@sivil_taram · Jul 23

Wrapped up a SWE-Perf website redesign using Qwen3-Coder on AnyCoder (huggingface.co/spaces/akhaliq…). The process was incredibly fast and great! One question for Qwen devs, though: did you pretrain a secret love for the color purple into the coder's persona? 😉

1

14

83

21

24.0K

Lin Zheng Retweeted

Q

Qian Liu@sivil_taram · Jul 23

The most rewarding moment in research: hearing someone say "This actually works in our scenario!" ✨

3

2

63

3

4.0K

L

Lin Zheng@linzhengisme · Jul 23

Countless times of iterations for cooking it, but the process is satisfying. I still believe we can pour more data in each stage if we have more hands so the potential is unlimited and scaling law hasn’t hit the wall yet! Towards Digital Agents🤖 We are already on the way.

QQwen@Alibaba_Qwen · Jul 22

>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…

1

6

50

1

2.0K

L

Lin Zheng@linzhengisme · Jul 23

Apart from the performance, it’s pure entertainment just watching Qwen3‑Coder build Qwen Code all by itself. Agentic coding is really something: it explores, understands, plans, and acts seamlessly. Honored to be “in the game”—even if my entire work so far is smashing the Enter…

QQwen@Alibaba_Qwen · Jul 22

>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…

2

10

44

7

4.0K

L

Lin Zheng@linzhengisme · Jul 22

Excited to bring Qwen3-Coder into the browser and terminal world! Building the scaffolding and environments for this big guy to play and learn is tough but incredibly "rewarding". Agentic coding and browsing are arguably the two most important skills for digital agents: they…

QQwen@Alibaba_Qwen · Jul 22

>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…

2

12

106

11

8.0K

Lin Zheng Retweeted

C

Chris Fifty@fifty_chris · Jul 21

Tired of your 1T param language model loss plateauing ~0.6-1.3? Simple solution: cheat by learning a latent language with better characteristics than English! Provocative title aside, I explored whether machines could develop their own "language" optimized for AI vs humans. 🧵

13

12

156

122

27.0K

Lin Zheng Retweeted

H

HKUNLP@hkunlp2020 · Jul 18

Xinyu Yang from CMU will be giving a talk titled "Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation" at Friday July 25 11am HKT (Thursday July 24 8pm PDT). Link to talk: hku.zoom.us/j/92651812689?…

1

7

21

1

2.0K

Lin Zheng Retweeted

A

AK@_akhaliq · Jul 17

SWE-Perf Can Language Models Optimize Code Performance on Real-World Repositories?

4

23

110

40

16.0K

Lin Zheng Retweeted

Q

Qian Liu@sivil_taram · Jul 17

🔥 LLMs can fix bugs, but can they make your code faster? We put them to the test on real-world repositories, and the results are in! 🚀 New paper: "SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?" Key findings: 1️⃣ We introduce SWE-Perf, the…

1

17

62

28

6.0K

L

Lin Zheng@linzhengisme · Jul 16

We should also turn our attention to the Dream series — an amazing research group that's steadily building the foundation for dLLMs.

LLingpeng Kong@ikekong · Jul 15

What happend after Dream 7B? First, Dream-Coder 7B: A fully open diffusion LLM for code delivering strong performance, trained exclusively on public data. Plus, DreamOn cracks the variable-length generation problem! It enables code infilling that goes beyond a fixed canvas.

0

1

16

0

1.0K

Lin Zheng Retweeted

X

Xiaohang Tang@xiaohang_tang · Jul 15

🧶1/ Diffusion-based LLMs (dLLMs) are fast & promising—but hard to fine-tune with RL. Why? Because their likelihoods are intractable, making common RL (like GRPO) inefficient & biased. 💡We present a novel method 𝐰𝐝𝟏, that mitigates these headaches. Let’s break it down.👇

4

10

47

31

4.0K

L

Lin Zheng@linzhengisme · Jul 15

Incredible to see how fast the field moves since we worked on masked diffusion arxiv.org/abs/2406.04329. Huge congratulations to @Jaeyeon_Kim_0 @sitanch for the award!

KKosta Derpanis@CSProfKGD · Jul 15

#ICML2025 Outstanding Paper Awards

3

15

90

23

9.0K

L

Lin Zheng@linzhengisme · Jul 15

Follow-up to Dream 7B, now focused on code: Dream-Coder 7B is a diffusion-based code LLM from HKU + Huawei Noah’s Ark, built on Qwen2.5-Coder and 322B open tokens. It replaces autoregressive decoding with denoising-based generation, enabling flexible infilling via DreamOn. A…

�𝚐𝔪𝟾𝚡𝚡𝟾@gm8xx8 · Apr 5

Dream 7B is a 7B open diffusion language model co-developed by Huawei Noah’s Ark Lab, designed as a scalable, controllable alternative to autoregressive LLMs. It matches or outperforms AR models of similar size on general, math, and coding benchmarks, and demonstrates strong…

2

6

19

7

2.0K

Lin Zheng Retweeted

L

Lingpeng Kong@ikekong · Jul 15

What happend after Dream 7B? First, Dream-Coder 7B: A fully open diffusion LLM for code delivering strong performance, trained exclusively on public data. Plus, DreamOn cracks the variable-length generation problem! It enables code infilling that goes beyond a fixed canvas.

1

33

71

20

6.0K

L

Lin Zheng@linzhengisme · Jul 15

DreamCoder - 7B diffusion model for even better coding performance!!🤗

ZZhihui Xie@_zhihuixie · Jul 15

🚀 Thrilled to announce Dream-Coder 7B — the most powerful open diffusion code  LLM to date.

0

1

13

0

518

L

Lin Zheng@linzhengisme · Jul 15

Supporting variable length generation is definitely a big step for diffusion language model. Checkout Dreamon - great work from Zirui😎!!

ZZirui Wu @ACL2025 🇦🇹@WilliamZR7 · Jul 15

We present DreamOn: a simple yet effective method for variable-length generation in diffusion language models. Our approach boosts code infilling performance significantly and even catches up with oracle results.

0

1

15

1

790

Lin Zheng Retweeted

J

Jiacheng Ye@JiachengYe15 · Jul 15

📢 Update: Announcing Dream's next-phase development. - Dream-Coder 7B: A fully open diffusion LLM for code delivering strong performance, trained exclusively on public data. - DreamOn: targeting the variable-length generation problem in dLLM!

1

22

78

14

9.0K

L

Lin Zheng@linzhengisme · Jul 15

Introducing Dream-Coder 7B -- pushing forward with diffusion language models for code generation💻

ZZhihui Xie@_zhihuixie · Jul 15

🚀 Thrilled to announce Dream-Coder 7B — the most powerful open diffusion code  LLM to date.

0

2

14

1

507

L

Lin Zheng@linzhengisme · Jul 15

Excited to share DreamOn—our latest work teaching diffusion LMs to dynamically expand and contract beyond fixed-size canvases!

ZZirui Wu @ACL2025 🇦🇹@WilliamZR7 · Jul 15

We present DreamOn: a simple yet effective method for variable-length generation in diffusion language models. Our approach boosts code infilling performance significantly and even catches up with oracle results.

0

7

26

4

2.0K

L

Lin Zheng@linzhengisme · Jul 14

Can we build an operating system entirely powered by neural networks? Introducing NeuralOS: towards a generative OS that directly predicts screen images from user inputs. Try it live: neural-os.com Paper: huggingface.co/papers/2507.08… Inspired by @karpathy's vision. 1/5

AAndrej Karpathy@karpathy · May 1

"Chatting" with LLM feels like using an 80s computer terminal. The GUI hasn't been invented, yet but imo some properties of it can start to be predicted. 1 it will be visual (like GUIs of the past) because vision (pictures, charts, animations, not so much reading) is the 10-lane…

7

38

184

78

26.0K