Wenhao Zhu
@Wenhao_NLP
AI researcher@ByteDance Seed | prev. @EdinburghNLP | Multilingual LLM & machine translation
📢Participate in *WMT25 terminology task* to showcase how you customise translations! What's new? More languages, more domains, sent/doc-level, and Pareto optimal of term accuracy and overall quality. Don't miss it cuz it only happens once every two years. statmt.org/wmt25/terminol…
The video in the link will surprise you. Trust me!
Not a social media/ X person, but still glad to announce Seed LiveInterpret 2.0. In short, it is an end-to-end, full duplex speech-to-speech simultaneous interpretation model. Achieves high-quality, ultra-low latency S2S translation. Website: seed.bytedance.com/en/seed_livein…
Could multi-turn interaction the next promising direction for scaling?
🚀 Call for Papers — @NeurIPSConf 2025 Workshop Multi-Turn Interactions in LLMs 📅 December 6/7 · 📍 San Diego Convention Center Join us to shape the future of interactive AI. Topics include but are not limited to: 🧠 Multi-Turn RL for Agentic Tasks (e.g., web & GUI agents,…
🚀 Introducing Prefix-RFT to blend SFT and RFT! SFT can learn more complex problems by mimicking, but can have poor generalization. RFT has better overall performance but is limited by the initial policy. Our method, Prefix-RFT, makes the best of both worlds!
Check this out if you are working on RL!
The RL codebase I like the most: - The NanoGPT of RL - Supports multi-turn RL - Just 1k lines of code in Python - Data, Tensor, Sequence Parallel github.com/ChenmienTan/RL2
ByteDance Seed released Seed-X, a Mistral-7B shaped LLM specialized for translation, apparently pretrained on ≈6.4B tokens, equaling the likes or R1 and 2.5-Pro in human evaluation. «We deliberately exclude STEM, coding, and reasoning-focused data» lol unexpected data paper
Introducing RL2: Ray-Less Reinforcement Learning for LLMs 🚀 Want to run RL experiments but tired of complicated abstractions? We've got you covered with <1K lines PPO/REINFORCE implementation: 🎯 Ray Less = Launch RL exps with torchrun just like SFT ⚡ Long-context…
Why do Long Context Language Models (LCLMs) excel at needle-in-a-haystack tasks but struggle with real-world applications? Can we evaluate them in a fully controlled setting? 🎉 Introducing our latest work: "A Controllable Examination for Long-Context Language Models" TL;DR:…
😕Feeling frustrated with this round of ACL Rolling Review (February). The interaction between reviewers and authors seems to have deteriorated compared to previous rounds. - As an Area Chair, I noticed almost no reviewers responded or updated their reviews after the rebuttal…
🚀 New Paper Alert! 🚀 We introduce Q-Filters, a training-free method for efficient KV Cache compression! It is compatible with FlashAttention and can compress along generation which is particularly useful for reasoning models ⚡ ⬇️R1-Distill-Llama-8B with 128 KV pairs ⬇️ 🧵
Tired of mGSM & multilingual MMLU? Saturated performance, limited task types & complexity... Academia researchers and industry LLM teams both need a better way to comprehensively evaluate LLM multilingual capabilities. Introducing BenchMAX! Maximizing the spectrum of…
🤩Excited to announce our new work BenchMAX!🥳 BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models Paper: huggingface.co/papers/2502.07… Repo: github.com/CONE-MT/BenchM… Datasets: huggingface.co/collections/LL…
What Dance Would You Like to Perform with Unitree G1? With the upgraded algorithm, G1 can learn any dance. Leave a comment to tell us what dance you'd like to see!😘 #Unitree #AGI #EmbodiedAI #SpringFestivalGalaRobot #AI #Humanoid #Bipedal #WorldModel #Dance
We replicated the DeepSeek-R1-Zero and DeepSeek-R1 training on 7B model with only 8K examples, the results are surprisingly strong. 🚀 Starting from Qwen2.5-Math-7B (base model), we perform RL on it directly. No SFT, no reward model, just 8K MATH examples for verification, the…