Lingpeng Kong
@ikekong
Assistant Professor @ The University of Hong Kong, Previously Research Scientist @ DeepMind
Xinyu Yang from CMU will be giving a talk titled "Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation" at Friday July 25 11am HKT (Thursday July 24 8pm PDT). Link to talk: hku.zoom.us/j/92651812689?…
📢 Update: Announcing Dream's next-phase development. - Dream-Coder 7B: A fully open diffusion LLM for code delivering strong performance, trained exclusively on public data. - DreamOn: targeting the variable-length generation problem in dLLM!
🚀 Thrilled to announce Dream-Coder 7B — the most powerful open diffusion code LLM to date.
We present DreamOn: a simple yet effective method for variable-length generation in diffusion language models. Our approach boosts code infilling performance significantly and even catches up with oracle results.
🌳TreeSynth: Synthesizing large-scale diverse data from scratch! Struggling with repetition and space collapse in data synthesis? 🤔 Our latest research mitigates this challenge through innovative tree-guided subspace partitioning. ✨Introducing TreeSynth—a novel framework…
🤖Can diffusion models write code competitively? Excited to share our latest 7B coding diffusion LLM!!💻 With DiffuCoder, we explore how they decode, why temperature🔥 matters, and how to improve them via coupled-GRPO that speaks diffusion!!📈 Code: github.com/apple/ml-diffu… 🧵
Huge milestone from the team! A blazing-fast diffusion LLM built for chat, delivering real-time performance at commercial scale. If you liked Mercury Coder for code, you'll love this for conversation.
We’re excited to launch Mercury, the first commercial-scale diffusion LLM tailored for chat applications! Ultra-fast and efficient, Mercury brings real-time responsiveness to conversations, just like Mercury Coder did for code.
Thanks for sharing our work!!!🙏Code release is in progress😺
DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation Apple introduces DiffuCoder, a 7B diffusion LLM trained on 130B tokens of code authors also propose a diffusion-native RL training framework, coupled-GRPO Decoding of dLLMs differ from…
MiMo-VL technical report, models, and evaluation suite are out! 🤗 Models: huggingface.co/XiaomiMiMo/MiM… (or RL) Report: arxiv.org/abs/2506.03569 Evaluation Suite: github.com/XiaomiMiMo/lmm… Looking back, it's incredible that we delivered such compact yet powerful vision-language…
The RL recipe from us, with everything fully open! hkunlp.github.io/blog/2025/Pola…
# 🚨 4B open-recipe model beats Claude-4-Opus 🔓 100% open data, recipe, model weights and code. Introducing Polaris✨--a post-training recipe for scaling RL on advanced reasoning models. 🥳 Check out how we boost open-recipe reasoning models to incredible performance levels…
🔬 The HKU team presents ParallelComp: a training-free technique for efficient context length extrapolation in LLMs—from 8K up to 128K tokens—on a single A100 GPU, with minimal performance loss. 📄 Paper: arxiv.org/abs/2502.14317 💻 Code: github.com/menik1126/Para…
Constant memory long CoT is here!
🔥 Meet PromptCoT-Mamba The first reasoning model with constant-memory inference to beat Transformers on competition-level math & code ⚡ Efficient decoding: no attention, no KV cache ⚡ +16.0% / +7.1% / +16.6% vs. s1.1-7B on AIME 24 / 25 / LiveCodeBench 🚀 Up to 3.66× faster
We’ve developed Gemini Diffusion: our state-of-the-art text diffusion model. Instead of predicting text directly, it learns to generate outputs by refining noise, step-by-step. This helps it excel at coding and math, where it can iterate over solutions quickly. #GoogleIO
I waited until like 1 am but it turns out to be 5am!!
Introducing Qwen3! We release and open-weight Qwen3, our latest large language models, including 2 MoE models and 6 dense models, ranging from 0.6B to 235B. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general…
Excited to share our work at ICLR 2025 in 🇸🇬. @iclr_conf 🥳 Happy to chat about LLM reasoning & planning, agents, and AI4Science! 📍Sat 26 Apr 3 p.m. CST — 5:30 p.m Hall 3 + Hall 2B #554