Lin Zheng
@linzhengisme
Ph.D. student @ HKU
🚀 Meet EvaByte: The best open-source tokenizer-free language model! Our 6.5B byte LM matches modern tokenizer-based LMs with 5x less data & 2x faster decoding, naturally extending to multimodal tasks while fixing tokenization quirks. 💻 Blog: bit.ly/3CjEmTC 🧵 1/9

Wrapped up a SWE-Perf website redesign using Qwen3-Coder on AnyCoder (huggingface.co/spaces/akhaliq…). The process was incredibly fast and great! One question for Qwen devs, though: did you pretrain a secret love for the color purple into the coder's persona? 😉
The most rewarding moment in research: hearing someone say "This actually works in our scenario!" ✨
Countless times of iterations for cooking it, but the process is satisfying. I still believe we can pour more data in each stage if we have more hands so the potential is unlimited and scaling law hasn’t hit the wall yet! Towards Digital Agents🤖 We are already on the way.
>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…
Apart from the performance, it’s pure entertainment just watching Qwen3‑Coder build Qwen Code all by itself. Agentic coding is really something: it explores, understands, plans, and acts seamlessly. Honored to be “in the game”—even if my entire work so far is smashing the Enter…
>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…
Excited to bring Qwen3-Coder into the browser and terminal world! Building the scaffolding and environments for this big guy to play and learn is tough but incredibly "rewarding". Agentic coding and browsing are arguably the two most important skills for digital agents: they…
>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…
Tired of your 1T param language model loss plateauing ~0.6-1.3? Simple solution: cheat by learning a latent language with better characteristics than English! Provocative title aside, I explored whether machines could develop their own "language" optimized for AI vs humans. 🧵
Xinyu Yang from CMU will be giving a talk titled "Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation" at Friday July 25 11am HKT (Thursday July 24 8pm PDT). Link to talk: hku.zoom.us/j/92651812689?…
SWE-Perf Can Language Models Optimize Code Performance on Real-World Repositories?
🔥 LLMs can fix bugs, but can they make your code faster? We put them to the test on real-world repositories, and the results are in! 🚀 New paper: "SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?" Key findings: 1️⃣ We introduce SWE-Perf, the…
We should also turn our attention to the Dream series — an amazing research group that's steadily building the foundation for dLLMs.
What happend after Dream 7B? First, Dream-Coder 7B: A fully open diffusion LLM for code delivering strong performance, trained exclusively on public data. Plus, DreamOn cracks the variable-length generation problem! It enables code infilling that goes beyond a fixed canvas.
🧶1/ Diffusion-based LLMs (dLLMs) are fast & promising—but hard to fine-tune with RL. Why? Because their likelihoods are intractable, making common RL (like GRPO) inefficient & biased. 💡We present a novel method 𝐰𝐝𝟏, that mitigates these headaches. Let’s break it down.👇
Incredible to see how fast the field moves since we worked on masked diffusion arxiv.org/abs/2406.04329. Huge congratulations to @Jaeyeon_Kim_0 @sitanch for the award!
#ICML2025 Outstanding Paper Awards
Follow-up to Dream 7B, now focused on code: Dream-Coder 7B is a diffusion-based code LLM from HKU + Huawei Noah’s Ark, built on Qwen2.5-Coder and 322B open tokens. It replaces autoregressive decoding with denoising-based generation, enabling flexible infilling via DreamOn. A…
Dream 7B is a 7B open diffusion language model co-developed by Huawei Noah’s Ark Lab, designed as a scalable, controllable alternative to autoregressive LLMs. It matches or outperforms AR models of similar size on general, math, and coding benchmarks, and demonstrates strong…
What happend after Dream 7B? First, Dream-Coder 7B: A fully open diffusion LLM for code delivering strong performance, trained exclusively on public data. Plus, DreamOn cracks the variable-length generation problem! It enables code infilling that goes beyond a fixed canvas.
DreamCoder - 7B diffusion model for even better coding performance!!🤗
🚀 Thrilled to announce Dream-Coder 7B — the most powerful open diffusion code LLM to date.
Supporting variable length generation is definitely a big step for diffusion language model. Checkout Dreamon - great work from Zirui😎!!
We present DreamOn: a simple yet effective method for variable-length generation in diffusion language models. Our approach boosts code infilling performance significantly and even catches up with oracle results.
📢 Update: Announcing Dream's next-phase development. - Dream-Coder 7B: A fully open diffusion LLM for code delivering strong performance, trained exclusively on public data. - DreamOn: targeting the variable-length generation problem in dLLM!
Introducing Dream-Coder 7B -- pushing forward with diffusion language models for code generation💻
🚀 Thrilled to announce Dream-Coder 7B — the most powerful open diffusion code LLM to date.
Excited to share DreamOn—our latest work teaching diffusion LMs to dynamically expand and contract beyond fixed-size canvases!
We present DreamOn: a simple yet effective method for variable-length generation in diffusion language models. Our approach boosts code infilling performance significantly and even catches up with oracle results.
Can we build an operating system entirely powered by neural networks? Introducing NeuralOS: towards a generative OS that directly predicts screen images from user inputs. Try it live: neural-os.com Paper: huggingface.co/papers/2507.08… Inspired by @karpathy's vision. 1/5
"Chatting" with LLM feels like using an 80s computer terminal. The GUI hasn't been invented, yet but imo some properties of it can start to be predicted. 1 it will be visual (like GUIs of the past) because vision (pictures, charts, animations, not so much reading) is the 10-lane…