Zhaopeng Tu

@tuzhaopeng

Principal Researcher, Hunyuan AI Digital Human, Tencent

China

Joined June 2008

189Following

2KFollowers

Pinned

Zhaopeng Tu@tuzhaopeng · Jul 4

We've taught LLMs math and code with RLVR. But can we teach them empathy? 🤖❤️ Introducing Reinforcement Learning with Verifiable Emotion Rewards (RLVER), the first RLVR framework that enhances LLMs' empathy from a simulated user . ❤️ Feelings → Numbers: A…

ZZhaopeng Tu@tuzhaopeng · Apr 30

Can today's LLMs truly understand you, not just your words? 🤖❤️ Introducing SAGE: Sentient Agent as a Judge — the first evaluation framework that uses sentient agents to simulate human emotional dynamics and inner reasoning for assessing social cognition in LLM conversations.…

217

153

32.0K

Pinned

Zhaopeng Tu@tuzhaopeng · Jun 9

🚨 Sentient Leaderboard shake-up! 🚨 🏆 A new king is crowned! The Gemini2.5-pro-0605 version has skyrocketed to #1 on our Sentient Leaderboard, with its social cognition score jumping from 62.9 to a massive 82.4! 🧠 First model to break the 80-point barrier in emotion reading &…

ZZhaopeng Tu@tuzhaopeng · May 27

🚨 New LLMs have just landed on the Sentient Leaderboard from the SAGE benchmark! 🚨 Can our favorite "helpful" models also excel socially? 🥉 New Gemini2.5 models (Flash-Think and Pro) firmly hold positions #3 and #4, with scores largely unchanged. 🔥 Claude4 debuts at #7,…

1.0K

Zhaopeng Tu Retweeted

Hunyuan@TencentHunyuan · 15 h

We're thrilled to release & open-source Hunyuan3D World Model 1.0! This model enables you to generate immersive, explorable, and interactive 3D worlds from just a sentence or an image. It's the industry's first open-source 3D world generation model, compatible with CG pipelines…

117

447

2.0K

1.0K

238.0K

Zhaopeng Tu@tuzhaopeng · Jul 22

Just stumbled upon an interesting finding: 16.8% of the outputs from the simple ASDIV dataset using DeepSeek-R1-0528 include the new thinking sentence "I think I'm overthinking it.". Seems like our overthinking work (accepted by #ICML2025) might be having a subtle impact after…

ZZhaopeng Tu@tuzhaopeng · Dec 31

Are we scaling test-time compute efficiently and intelligently? Introducing a comprehensive study on the prevalent issue of overthinking in o1-like models, where excessive computational resources are allocated for simple problems with minimal benefit. 🪡Across extensive…

1.0K

Zhaopeng Tu@tuzhaopeng · Jul 14

Thank you for recommending RLVER! It’s a step forward in training models that respond with empathy — by turning simulated emotional reactions into measurable feedback, we’re making emotional intelligence teachable. x.com/tuzhaopeng/sta…

DDailyPapers@HuggingPapers · Jul 13

Tencent just released RLVER on Hugging Face! It's a pioneering reinforcement learning framework that uses verifiable emotion rewards to create truly empathetic AI agents.

279

Zhaopeng Tu@tuzhaopeng · Jul 7

当前大模型之所以“看起来富有同理心”，主要得益于以下两点： ① 在预训练阶段，通过海量对话语料学习到共情类话术； ② 在对齐阶段（如 RLHF 或 RLVER）进一步利用情绪或人类反馈信号强化“共情”行为。 SAGE 通过情感推理将这种“情绪价值”转化为 0–100 的可追溯评分，为 RLVER…

马马东锡 NLP@dongxi_nlp · Jul 5

「Sentient Agent, 情绪价值」为什么大模型总是那么有同理心，会提供情绪价值？分享两篇文章，关注 LLM 智商之外的另外一个维度，情商，即情绪价值的能力。 [ 论文 1 ] Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models [ 论文 2 ]…

7.0K

Zhaopeng Tu@tuzhaopeng · Jul 6

Thank you for including RLVER! It’s a step forward in training models that respond with empathy — by turning simulated emotional reactions into measurable feedback, we’re making emotional intelligence teachable. x.com/tuzhaopeng/sta…

EEli5DeFi@eli5_defi · Jul 6

➥ Eli5 Reading List (Pt. 3) ➠ @DavidBeckworth - Winners, Losers, and the Lingering Fears of Inflation macroeconomicpolicynexus.substack.com/p/winners-lose… ➠ @0xShitTrader - The @atlasxyz Design Philosophy x.com/0xShitTrader/s… ➠ @ambergroup_io - Proof of Useful Work ambergroup.medium.com/proof-of-usefu… ➠…

410

Zhaopeng Tu@tuzhaopeng · Jul 4

Check out RLVER — the first RLVR framework to boost LLM empathy, using a simulated user that turns emotional reactions into reward signals. We’re open-sourcing code, checkpoints, and scripts to accelerate research into emotionally intelligent AI! Learn more:…

ZZhaopeng Tu@tuzhaopeng · Jul 4

7.0K

Zhaopeng Tu@tuzhaopeng · Jun 25

Thrilled to share that our work "Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target Atoms" has been selected as an ACL 2025 Oral (Top 8.1%)! 🎉 The first author Mengru Wang will be presenting it in person — feel free to drop by and chat! Looking…

NNingyu Zhang@ZJU@zxlzr · May 28

Is it possible to “give medicine” (vectors) to large language models to control their behavior at inference time? 🚀 Excited to share our ACL 2025 work: Steering Target Atoms (STA) — an approach to controlling LLM behavior at inference time without retraining. #ACL2025 #LLM #AI…

4.0K

Zhaopeng Tu Retweeted

Zhuosheng Zhang@zhangzhuosheng · Jun 23

🚀 Atomic-to-Compositional Generalization for Mobile Agents 🧠 A new benchmark & scheduling system to push the limits of mobile agens. 📄 Paper: arxiv.org/abs/2506.08972 🌐 Website: ui-nexus.github.io 🧵1/n

2.0K

Zhaopeng Tu Retweeted

Hunyuan@TencentHunyuan · Jun 22

🔴 We’re live. Join the Tencent Hunyuan #ICML2025 Frontier Papers Workshop now!

6.0K

Zhaopeng Tu Retweeted

Hunyuan@TencentHunyuan · Jun 13

🚀 Hunyuan 3D 2.1 is here! The first fully open-source, production-ready PBR 3D generative model! ✅Cinema-grade visuals: PBR material synthesis brings leather, bronze, and more to life with stunning light interactions. ✅ Fully open-source: Model weights, training/inference…

320

2.0K

1.0K

231.0K

Zhaopeng Tu@tuzhaopeng · Jun 13

Thanks for adopting DeepMath-103K! It's great to have both RLVR and agentic RL support!

wwang@weixunwang · Jun 11

🚀 Introducing ROLL: An Efficient and User-Friendly RL Training Framework for Large-Scale Learning! 🔥 Efficient, Scalable & Flexible – Train 200B+ models with 5D parallelism (TP/PP/CP/EP/DP), seamless vLLM/SGLang switching, async multi-env rollout for maximum RL throughput!

538

Zhaopeng Tu@tuzhaopeng · Jun 12

Had a great time at this CVPR community-building workshop---lots of fun discussions and some really important insights for early-career researchers. I also gave a talk on "Research as an Infinite Game." Here are the slides: canva.com/design/DAGp0iR…

AAnand Bhattad@anand_bhattad · Jun 10

In this #CVPR2025 edition of our community-building workshop series, we focus on supporting the growth of early-career researchers. Join us tomorrow (Jun 11) at 12:45 PM in Room 209 Schedule: sites.google.com/view/standoutc… We have an exciting lineup of invited talks and candid…

354

127

42.0K

Zhaopeng Tu@tuzhaopeng · Jun 5

Can image safeguards be bypassed by breaking harmful prompts into harmless steps? ⚠️ Introducing Chain-of-Jailbreak (CoJ) Attack, a novel method showing how image generation models (e.g., GPT-4V/o, Gemini 1.5) can be compromised by decomposing malicious queries into a…

tuzhaopeng's tweet image. Can image safeguards be bypassed by breaking harmful prompts into harmless steps? ⚠️

Introducing Chain-of-Jailbreak (CoJ) Attack, a novel method showing how image generation models (e.g., GPT-4V/o, Gemini 1.5) can be compromised by decomposing malicious queries into a…

3.0K

Zhaopeng Tu@tuzhaopeng · Jun 4

Can MLLMs truly "see" safety risks in image-text combinations? 🌲🖼️ Introducing MMSafetyAwareness, the first comprehensive benchmark for multimodal safety awareness in MLLMs, featuring 1,500 image-prompt pairs across 29 safety scenarios to evaluate whether models correctly…

tuzhaopeng's tweet image. Can MLLMs truly "see" safety risks in image-text combinations? 🌲🖼️

Introducing MMSafetyAwareness, the first comprehensive benchmark for multimodal safety awareness in MLLMs, featuring 1,500 image-prompt pairs across 29 safety scenarios to evaluate whether models correctly…

6.0K