Simon Yu
@simon_ycl
1st Year PhD Student, supervised by @shi_weiyan | Incoming intern in @OrbyAI | MRes and BSc Student @EdinburghNLP | Member of @CohereForAI
The future of RL+LLM? Self-play. Why? Competitive scenarios offer: ✅ Built-in verification ✅ Automated curriculum learning ✅ Infinite complexity scaling Games prove this works for multi-turn, multi-agent systems. But the real potential? Extending beyond games to real-world…
We've always been excited about self-play unlocking continuously improving agents. Our insight: RL selects generalizable CoT patterns from pretrained LLMs. Games provide perfect testing grounds with cheap, verifiable rewards. Self-play automatically discovers and reinforces…
Kimi K2. More evidence that: • The lead of American “frontier” AI companies is rather small 🔬 • A broad ecosystem of strong foundation model companies is developing in China, with more players than in the US or elsewhere. 🐅 moonshotai.github.io/Kimi-K2/ github.com/MoonshotAI/Kim…
LLMs Encode Harmfulness and Refusal Separately Jiachen Zhao (@jcz12856876), Jing Huang, Zhengxuan Wu (@ZhengxuanZenWu), @davidbau, Weiyan Shi (@shi_weiyan)
For the first time it feels within reach to go from the current status quo of single-turn oracles to generally intelligent, general purpose multi-turn agents. Very happy to help organize this NeurIPS workshop (lead by @simon_ycl ). If you are working on something related,…
🚀 Call for Papers — @NeurIPSConf 2025 Workshop Multi-Turn Interactions in LLMs 📅 December 6/7 · 📍 San Diego Convention Center Join us to shape the future of interactive AI. Topics include but are not limited to: 🧠 Multi-Turn RL for Agentic Tasks (e.g., web & GUI agents,…
this is basically self-contained + very concise but still should have pretty great perf github.com/ChenmienTan/RL2
>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…
The world is moving towards agents Static benchmarks don't measure what agents do best (multi-turn reasoning) Thus, interactive benchmarks: * Terminal Bench (@alexgshaw, @Mike_A_Merrill) * Text Arena (@LeonGuertler) * BALROG (@PaglieriDavide, @_rockt) * ARC-AGI-3 (@arcprize)
LLMs encode 'this is harmful' and 'I should refuse' as two different things. Why this matters 👇 check the amazing work by my labmate @jcz12856876!
💥New Paper💥 #LLMs encode harmfulness and refusal separately! 1️⃣We found a harmfulness direction 2️⃣The model internally knows a prompt is harmless, but still refuses it🤯 3️⃣Implication for #AI #safety & #alignment? Let’s analyze the harmfulness direction and use Latent Guard 🛡️
1/ 🚨New Paper 🚨 LLMs are trained to refuse harmful instructions, but internally, do they see harmfulness and refusal as the same? ⚔️We find causal evidence that 👈”LLMs encode harmfulness and refusal separately” 👉. ✂️LLMs may know a prompt is harmful internally yet still…
Pleased to share that our Multi-Turn Interactions in LLMs workshop at NeurIPS 2025! …shop-multi-turn-interaction.github.io Welcome work on Multi-Turn RL, multi-turn human<->agent/agent<->agent/agent<->environment interactions, multi-turn tool use, multi-turn alignment, multi-turn evaluation,…
🚀 Call for Papers — @NeurIPSConf 2025 Workshop Multi-Turn Interactions in LLMs 📅 December 6/7 · 📍 San Diego Convention Center Join us to shape the future of interactive AI. Topics include but are not limited to: 🧠 Multi-Turn RL for Agentic Tasks (e.g., web & GUI agents,…
Excited to share that our multi-turn interactions workshop has been accepted by neurips!
🚀 Call for Papers — @NeurIPSConf 2025 Workshop Multi-Turn Interactions in LLMs 📅 December 6/7 · 📍 San Diego Convention Center Join us to shape the future of interactive AI. Topics include but are not limited to: 🧠 Multi-Turn RL for Agentic Tasks (e.g., web & GUI agents,…
🤔Long-horizon tasks: How to train LLMs for the marathon?🌀 Submit anything on 🔁"Multi-turn Interactions in LLMs"🔁 to our @NeurIPSConf workshop by 08/22: 📕 Multi-Turn RL ⚖️ Multi-Turn Alignment 💬 Multi-Turn Human-AI Teaming 📊 Multi-Turn Eval ♾️You name it! #neurips #LLM
🚀 Call for Papers — @NeurIPSConf 2025 Workshop Multi-Turn Interactions in LLMs 📅 December 6/7 · 📍 San Diego Convention Center Join us to shape the future of interactive AI. Topics include but are not limited to: 🧠 Multi-Turn RL for Agentic Tasks (e.g., web & GUI agents,…
Interested in Multi-turn RL, Agent, Alignment, and Interactions with LLMs? Join and submit your paper on our multi-turn interaction workshop at #NeurIPS2025 !
🚀 Call for Papers — @NeurIPSConf 2025 Workshop Multi-Turn Interactions in LLMs 📅 December 6/7 · 📍 San Diego Convention Center Join us to shape the future of interactive AI. Topics include but are not limited to: 🧠 Multi-Turn RL for Agentic Tasks (e.g., web & GUI agents,…
Join us at #NeurIPS2025 workshop to explore the future of multi-turn AI interactions! We welcome submissions on RL for agents, alignment, evaluation methods, and more.
🚀 Call for Papers — @NeurIPSConf 2025 Workshop Multi-Turn Interactions in LLMs 📅 December 6/7 · 📍 San Diego Convention Center Join us to shape the future of interactive AI. Topics include but are not limited to: 🧠 Multi-Turn RL for Agentic Tasks (e.g., web & GUI agents,…
We’re excited to share that our work “Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement” will be highlighted today at ICML 2025 in Vancouver, Canada! 🇨🇦 🎉Congrats to all authors! @simon_ycl, @cliangyu_, Sara Ahmadian, @mziizm,
Love this clever combination of SFT + RL for distribution shift 🔥 Check it out!
🚀 Introducing Prefix-RFT to blend SFT and RFT! SFT can learn more complex problems by mimicking, but can have poor generalization. RFT has better overall performance but is limited by the initial policy. Our method, Prefix-RFT, makes the best of both worlds!
The RL codebase I like the most: - The NanoGPT of RL - Supports multi-turn RL - Just 1k lines of code in Python - Data, Tensor, Sequence Parallel github.com/ChenmienTan/RL2
Excited to announce the Mindgame @NeurIPS Competition is officially LIVE! 🤖 Pit your agents against others in Mafia, Codename, Prisoner’s Dilemma, Stg Hunt, and Colonel Blotto. Sign up now for $500 in compute credits on your initial run! 🔗 Register : mindgamesarena.com
two quick reasons why I'm not convinced that latent reasoning > CoT: - if that worked, we would have been doing non-autoregressive decoding for ages - in arxiv.org/abs/2011.03459 we found that grounding intermediate symbols works better than having fuzzy embeddings when reasoning
Every few years I think you should jump into a new deep-end by doing something that pushes you out of your comfort zone. However, people rarely do because it is easier to stick with what is known. I think this is true in research, life partners and big life decisions.
ICML ✈️ this week. open to chat and learn mech interp from you. @aryaman2020 and i have cool ideas about steering, just come to our AxBench poster. new steering blog: zen-wu.social/steer/index.ht… 中文: zen-wu.social/steer/cn_index…
i forgot the whole point of saying you're at a conference is to advertise your poster please come check out AxBench by @ZhengxuanZenWu* me* et al. on Tuesday, 15 July at 11 AM - 1:30 PM