Mingyang Chen

@chen_mingyang

LLM Engineer @BaichuanAI | Ph.D. from @ZJU_China | Visiting Researcher @EdinburghUni. LLMs, Post-training, Reasoning.

Joined August 2019

207Following

288Followers

Pinned

Mingyang Chen@chen_mingyang · Mar 28

🚀 Exciting news! Our open-source project ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning has just released its models, paper, and updated code! Check it out: github.com/Agent-RL/ReSea…

AAK@_akhaliq · Mar 26

ReSearch Learning to Reason with Search for LLMs via Reinforcement Learning

4.0K

Mingyang Chen@chen_mingyang · Jun 27

Many thanks to AK for sharing our work! Introducing "ReCode: Updating Code API Knowledge with Reinforcement Learning" — the RL framework that teaches models to update code API knowledge. Paper: huggingface.co/papers/2506.20… Code: github.com/zjunlp/ReCode 📚 Trained on 2K+ API…

AAK@_akhaliq · Jun 26

ReCode Updating Code API Knowledge with Reinforcement Learning

2.0K

Mingyang Chen Retweeted

Ningyu Zhang@ZJU@zxlzr · Jun 14

Introducing AutoMind: Adaptive Knowledgeable Agent for Automated Data Science Paper: arxiv.org/abs/2506.10974 Code (will be released soon): github.com/innovatingAI/A… Our latest work AutoMind is a new LLM agent framework that automates end-to-end machine learning pipelines by…

53.0K

Mingyang Chen@chen_mingyang · Jun 6

Impressive work on the reasoning model for AI in scientific research! Keep pushing the boundaries of LLM reasoning.

QQiao Jin, MD@DrQiaoJin · Jun 6

🧩 Introducing Cell-o1, a reasoning-enhanced LLM for batch-level single-cell annotation. 🧬 It takes on CellPuzzles, a new benchmark where each batch contains multiple cells with globally unique types – requiring joint reasoning over gene expression and context. 🧠 Not just…

547

Mingyang Chen Retweeted

Ningyu Zhang@ZJU@zxlzr · May 21

We introduce Reinforcing "Cognitive Experts" – a new approach to enhance reasoning in MoE-based Large Reasoning Models (LRMs) 🌟. Thanks to Tencent's support, we had the opportunity to explore the inner workings of ultra-large models like DeepSeek-R1-671B and Qwen3-235B. By…

25.0K

Mingyang Chen Retweeted

Cursor@cursor_ai · May 29

A conversation on the optimal reward for coding agents, infinite context models, and real-time RL

140

2.0K

1.0K

267.0K

Mingyang Chen@chen_mingyang · May 30

RL without ground-truth answer!

iinsightLLM@insightLLM · May 29

🎉 Introducing our latest work ！ 🚀 We propose a label-free method that enables RL without ground truth answer, yet achieves impressive performance on mathematical tasks: 40.0% accuracy on AIME2024 🎯 with a 7B base model. Paper: huggingface.co/papers/2505.19…

245

Mingyang Chen@chen_mingyang · May 29

与朋友闲聊，他提到自己一直在寻找身边的"超级节点"，通过与这些高质量对话者交流来"训练"自己，就像训练大LLM。大模型工程师完全可以把训练大模型的经验迁移到和人聊天这件事情上： 1️⃣ 精选"语料"：找到高质量、与你匹配的交流对象，就像AI需要优质数据集 2️⃣…

243

Mingyang Chen Retweeted

Ningyu Zhang@ZJU@zxlzr · May 28

Is it possible to “give medicine” (vectors) to large language models to control their behavior at inference time? 🚀 Excited to share our ACL 2025 work: Steering Target Atoms (STA) — an approach to controlling LLM behavior at inference time without retraining. #ACL2025 #LLM #AI…

22.0K

Mingyang Chen Retweeted

Anthropic@AnthropicAI · May 23

THE WAY OF CODE, a project by @rickrubin in collaboration with Anthropic:

488

1.0K

11.0K

6.0K

2.6M

Mingyang Chen@chen_mingyang · May 23

👏 Great work building on our open-source ReCall project! github.com/Agent-RL/ReCall

kkabi@kakakbibibi · May 23

🌟 Introducing Tool-Star, a powerful RL-based multi-tool reasoner! 🚀Achieving impressive results on 10+ top reasoning benchmarks with only 3B params! 🥳 We’ve open-sourced all the code, datasets, and checkpoints Github: github.com/dongguanting/T… Paper: huggingface.co/papers/2505.16…

383

Mingyang Chen Retweeted

Lilian Weng@lilianweng · May 17

Giving your models more time to think before prediction, like via smart decoding, chain-of-thoughts reasoning, latent thoughts, etc, turns out to be quite effective for unblocking the next level of intelligence. New post is here :) “Why we think”: lilianweng.github.io/posts/2025-05-…

435

3.0K

2.0K

216.0K

Mingyang Chen Retweeted

Jason Weston@jaseweston · May 16

🚨 New paper 🚨 J1: Incentivizing Thinking in LLM-as-a-Judge via RL - Converts judgement task into a verifiable one for both verifiable and non-verifiable prompts. Uses only synthetic pairwise data - Optimizes thoughts, scores, and judgments using GRPO - Outperforms all…

376

296

37.0K

Mingyang Chen@chen_mingyang · May 14

Nice work for efficient post-training on reasoning with search for LLMs!

eelvis@omarsar0 · May 14

RL for Search-Efficient LLMs Presents a new post-training RL framework that explicitly trains LLMs to optimize search usage. Recipe: structured reasoning template and reward policy + GRPO Leads to smarter and more efficient reasoning and retrieval of external knowledge.

289

Mingyang Chen Retweeted

Yunzhi Yao@yyzTodd · May 14

🚨 New Blog Drop! 🚀 "Reflection on Knowledge Editing: Charting the Next Steps" is live! 💡 Ever wondered why knowledge editing in LLMs still feels more like a lab experiment than a real-world solution? In this post, we dive deep into where the research is thriving — and where…

5.0K

Mingyang Chen Retweeted

Rohan Pandey@khoomeik · May 11

isn’t it weird that we spend >99% of RL compute on the agent and <1% on the environment (eg python runtime)? who is working on scaling environment compute? and is there an optimal agent-environment compute frontier for intelligence?

929

355

109.0K

Mingyang Chen Retweeted

Ningyu Zhang@ZJU@zxlzr · May 9

Honored to contribute to a survey on Full-Stack LLM Safety 🔒🤖 A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment ArXiv: arxiv.org/abs/2504.15585 We go beyond existing works by covering the entire lifecycle of LLMs—from data to deployment…

7.0K

Mingyang Chen Retweeted

NovaSky@NovaSkyAI · May 7

1/N Introducing SkyRL-v0, our RL training pipeline enabling efficient RL training for long-horizon, real-environment tasks like SWE-Bench. We also open-source a series of our early trained models to showcase the potential of end-to-end online RL training on long-horizon (20-50…

271

175

90.0K

Mingyang Chen@chen_mingyang · Apr 29

感谢马老师分享我们近期正在推进的一项工作，观点非常有洞察！Stay tuned！

马马东锡 NLP@dongxi_nlp · Apr 28

「Reasoning, Agent」WIP论文 ReCall: Learning to Reason with Tool Call for LLMs via Reinforcement Learning ReCall，是ReSearch的全面升级，不止搜索，它让LLM协同外部多种工具，一次关于通用 Agent的重要尝试。起点：ReSearch = “学会反复 Search”。核心方法是利用结构化符号 (“协议…

397