Haibin
@eric_haibin_lin
Bytedance Seed LLM Systems, @verl_project, author of Megascale. Prev. Amazon AI, ex-committer for BytePS, Gluon-NLP, @ApacheMXNet
🚀 Introducing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. 💪DeepSWE…
As a community we continue to provide popular open source post-training recipes. Recently we worked together with the community @lmsysorg @itsyuhao to bring: - SPIN, an online DPO approach for alignment - Self-play preference optimization (SPPO) github.com/volcengine/ver…
verl’s agent loop is a general interface for multi-turn rollout and agentic RL, making it easy to plugin any agent framework
Code for reproducing retool with verl is released, scoring 60 mean@30 in AIME25: github.com/volcengine/ver… Besides, the general agent loop interface in verl v0.5 supports both OpenAI ChatCompletion API & token-in-token-out generations: verl.readthedocs.io/en/latest/adva…
Congrats to @cognition_labs! Looking forward to having both foreground & background coding agents working together. In fact Devin is already the top 20 committer to @verl_project . A great assistant for creating well-tested and documented code.
Cognition has signed a definitive agreement to acquire Windsurf. The acquisition includes Windsurf’s IP, product, trademark and brand, and strong business. Above all, it includes Windsurf’s world-class people, whom we’re privileged to welcome to our team. We are also honoring…
The 1st verl meetup will be held at ICML Vancouver on July 16th! Please join us if you will be there! lu.ma/0ek2nyao (onsite only) Featuring speakers from verl & SGLang dev team, plus @BeidiChen from @InfiniAILab and @jxwuyi from Ant RL Lab #verl #ICML #Vancouver
If you're in Singapore on 7/11, do not miss this meetup! Talks from the verl community: - LLMs to optimize code performance on real-world repos & verl project updates @sivil_taram - Long-horizon LLM agent training with verl-agent @langfengq Link: lu.ma/e498qhsi
🚀🚀🚀
DeepSeek 671b and Qwen3 236b support with Megatron backend is now available as preview in verl v0.4.0 🔥🔥🔥 We will continue optimizing MoE model performance down the road. DeepSeek 671b: verl.readthedocs.io/en/latest/perf… verl v0.4: github.com/volcengine/ver…
🥳 Happy to share our new work – Kinetics: Rethinking Test-Time Scaling Laws 🤔How to effectively build a powerful reasoning agent? Existing compute-optimal scaling laws suggest 64K thinking tokens + 1.7B model > 32B model. But, It only shows half of the picture! 🚨 The O(N²)…
Introducing VerlTool - a unified and easy-to-extend tool agent training framework based on verl. Recently, there's been a growing trend toward training tool agents with reinforcement learning algorithms like GRPO and PPO. Representative works include SearchR1, ToRL, ReTool, and…
Multi-GPU LoRA RL is now available in verl! It enables 70B+ model RL with 8 GPUs in bf16. Getting started: verl.readthedocs.io/en/latest/adva… Credit to: Simon Huang, @vermouth1992 @stephenx_ @jiayi_pirate @LongTonyLian skepsun, Weitao Feng, Alexey Malakhov, and many in the community
"verl-agent" is also the official repo for paper "Group-in-Group Policy Optimization for LLM Agent Training" arxiv.org/abs/2505.10978 Here are some results (find wandb logs here: github.com/langfengQ/verl…)
SkyRL is a great work extending @verl_project with environments for agent tasks. It leverages the sglang multi-turn/tool calling feature recently added to verl: github.com/zhaochenyang20…
1/N Introducing SkyRL-v0, our RL training pipeline enabling efficient RL training for long-horizon, real-environment tasks like SWE-Bench. We also open-source a series of our early trained models to showcase the potential of end-to-end online RL training on long-horizon (20-50…
Very efficient yet powerful 8B coding model trained with @verl_project. Check it out! bytedance-seed-coder.github.io
🚀 Thrilled to introduce Seed-Coder, our new open-source family of 🔥powerful, 🔍transparent, ⚡parameter-efficient code models at 8B scale! 💥 Small model, big results! 🥇 BigCodeBench, FullStack Bench, MHPP: impressive results among lightweight open-source models, including…
verl is embracing @PyTorch fsdp2! Better throughput, memory usage, and composability with torch.compile! Please try it out and give us feedbacks: github.com/volcengine/ver…
qwen3 with both Megatron and FSDP reinforcement learning support is now available in verl! github.com/volcengine/ver… qwen.readthedocs.io/en/latest/trai…
verl provides day 1 RL support for qwen3 models! Both sequence parallelism and remove padding acceleration are available. Run it with @vllm_project >=v0.8.4 or sglang v0.4.6.post1 with your target model (e.g. model_path=Qwen/Qwen3-30B-A3B)
Welcome to enjoy ICLR Expo Talk Panel "verl: Flexible and Efficient Infrastructures for Post-training LLMs" from ByteDance Seed! - Link: iclr.cc/Expo/Conferenc… - Time: Sat 26 Apr 1 (Today!) p.m. - 2 p.m. +08 - Location: Peridot 202-203 (2nd Floor in Expo) - Contents: - "verl:…
Deploy verl on AMD GPUs for fast, scalable RLHF training with ROCm optimization, docker scripts, and impressive throughput-convergence results 🚀🚀🚀 rocm.blogs.amd.com/artificial-int…