Junhao Chen
@Cumquaaa
Senior @Tsinghua_Uni, previously interned @tsvetshop. My interests lie in NLP, CV and RL.
🚀 Training an image generation model and picking sides between autoregressive (AR) and diffusion? Why not both? Check out MADFormer with half of the model layers for AR and half for diffusion. AR gives a fast guess for the next patch prediction while diffusion helps refine the…

How do we ground #LLMs for Scientific Problems to mitigate the issue of hallucination? Check out our #icml2025 paper on ``Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation'' Paper: arxiv.org/abs/2411.00412 Code:…
Our framework supports various online RL algorithms. In our experiments, we use GRPO with the following optimizations: 1️⃣ Pre-sampling Curriculum: Dynamically filters fully correct or incorrect tasks to enhance stability and training efficiency. 2️⃣ Exploration Encouragement:…
SOTA: SimpleVLA-RL achieves 98.4% on LIBERO 🎯With only 1 trajectory/task for SFT: 🚀LIBERO-Avg: 48.9%→94.1% 🚀LIBERO-Long: 17.1%→91.8% (5/🧵)
Moonlighting a bit: we implement Online RL for VLA models with @verl🤖, and find simple outcome rewards can work surprisingly well! The code is open-sourced: github.com/PRIME-RL/Simpl…. 🚀Thrilled to introduce SimpleVLA-RL! With only one trajectory per task for SFT, SimpleVLA-RL…
🛠️ Build your own LLM (or benchmark) analysis/debugging tool github.com/Zhiyuan-Zeng/E… 🚀 Try our demo zhiyuan-zeng.github.io/EvalTree 1️⃣ Explore what your benchmark actually evaluates — from coarse-grained capabilities like algebraic reasoning to fine-grained ones like calculating…
🚀 Try our demo! 🌐 zhiyuan-zeng.github.io/EvalTree Explore LM performance interactively at different granularities on capability trees 🌳 Huge thanks to @xingyaow_ @gneubig & all the amazing authors of @allhands_ai, the agent who actually built it! 🙌 [8/n]
Is a single accuracy number all we can get from model evals?🤔 🚨Does NOT tell where the model fails 🚨Does NOT tell how to improve it Introducing EvalTree🌳 🔍identifying LM weaknesses in natural language 🚀weaknesses serve as actionable guidance (paper&demo 🔗in🧵) [1/n]