verl project
@verl_project
Open RL library for LLMs. https://github.com/volcengine/verl Join us on http://verl-project.slack.com
Remember the NoisyStudent topping ImageNet back in 2019🏆? Was it the last dance of noisy training? 🍻 Meet NoisyRollout, our new noisy training efforts in building stronger o1-like visual reasoners. ✨ With only 2.1k training data and zero additional training cost, it hits…
Nice new results from @GoogleAI researchers on improving the state-of-the-art on ImageNet! "We...train a...model on...ImageNet...& use it as a teacher to generate pseudo labels on 300M unlabeled images. We then train a larger...model on the...labeled & pseudo labeled images."
“What is the answer of 1 + 1?” Large Reasoning Models (LRMs) may generate 1500+ tokens just to answer this trivial question. Too much thinking 🤯 Can LRMs be both Faster AND Stronger? Yes. Introducing LASER💥: Learn to Reason Efficiently with Adaptive Length-based Reward Shaping…
1/ Long chain-of-thought (CoT) reasoning boosts LLM performance—but with a computational overhead. Checkout our new paper, ThinkPrune, where we explore a simple question: To what extent can we cut the reasoning length while keep the quality? We show that by simply adding a hard…
# 🚨 4B open-recipe model beats Claude-4-Opus 🔓 100% open data, recipe, model weights and code. Introducing Polaris✨--a post-training recipe for scaling RL on advanced reasoning models. 🥳 Check out how we boost open-recipe reasoning models to incredible performance levels…
MemAgent MemAgent-14B is trained on 32K-length documents with an 8K context window. Achieves >76% accuracy even at 3.5M tokens! That consistency is crazy! Here are my notes:
The 1st verl meetup will be held at ICML Vancouver on July 16th! Please join us if you will be there! lu.ma/0ek2nyao (onsite only) Featuring speakers from verl & SGLang dev team, plus @BeidiChen from @InfiniAILab and @jxwuyi from Ant RL Lab #verl #ICML #Vancouver
If you're in Singapore on 7/11, do not miss this meetup! Talks from the verl community: - LLMs to optimize code performance on real-world repos & verl project updates @sivil_taram - Long-horizon LLM agent training with verl-agent @langfengq Link: lu.ma/e498qhsi
Open-source "verl-agent" codebase is evolving fast⚡ A scalable, multi-turn reinforcement learning framework for training LLM/VLM-based agents — now with rich features! (see summary in image below🔽) 🚀 Try it out and train your own LLM agents 📎 GitHub: github.com/langfengQ/verl…
🚀 Excited to introduce our latest work GRESO: a method that identifies and skips millions of uninformative prompts before rollout and achieves up to 2.0x wall-clock time speedup in training. More rollouts lead to better model performance, but they’re also a major bottleneck in…
🚀 Thrilled to unveil ReVisual-R1! Our 7B open-source MLLM achieves long, accurate & thoughtful reasoning! 🔥 SOTA on 9 key benchmarks! Including AIME24 (53.3) & MathVision (48.8). Overall +16.8% avg! 📈 📄 Paper: arxiv.org/pdf/2506.04207 💻 Code: github.com/CSfufu/Revisua…
💥Async RL rollouts that are 75% faster than other async implementations - removing all synchronous parts of the rollout - a single step in multi-turn is independent and async of all other completions - completions can finish independent of other completions
DeepSeek 671b and Qwen3 236b support with Megatron backend is now available as preview in verl v0.4.0 🔥🔥🔥 We will continue optimizing MoE model performance down the road. DeepSeek 671b: verl.readthedocs.io/en/latest/perf… verl v0.4: github.com/volcengine/ver…

Distributed training on GPU clusters shouldn't be complex. Check out the latest blog on orchestrating reasoning agent training with RAGEN and @verl_project on our 1-Click Clusters, powered by @dstackai 🔗 lambda.ai/blog/agent-tra…
Introducing VerlTool - a unified and easy-to-extend tool agent training framework based on verl. Recently, there's been a growing trend toward training tool agents with reinforcement learning algorithms like GRPO and PPO. Representative works include SearchR1, ToRL, ReTool, and…
Multi-GPU LoRA RL is now available in verl! It enables 70B+ model RL with 8 GPUs in bf16. Getting started: verl.readthedocs.io/en/latest/adva… Credit to: Simon Huang, @vermouth1992 @stephenx_ @jiayi_pirate @LongTonyLian skepsun, Weitao Feng, Alexey Malakhov, and many in the community
Guess it's the first open-source multi-turn e2e RL for GUI Agents from academia, and it's based on UI-TARS-1.5-7B. If you want to study multimodal Agent RL, it is a good startpoint~ arxiv.org/abs/2505.16282
1/N Introducing SkyRL-SQL, a simple, data-efficient RL pipeline for Text-to-SQL that trains LLMs to interactively probe, refine, and verify SQL queries with a real database. 🚀 Early Result: trained on just ~600 samples, SkyRL-SQL-7B outperforms GPT-4o, o4-mini, and SFT model…