Yunzhi Yao
@yyzTodd
Visiting Phd @UCLA , Phd candidate @ZJU_China; Before @MSFTResearch @AlibabaGroup
🚨 New Blog Drop! 🚀 "Reflection on Knowledge Editing: Charting the Next Steps" is live! 💡 Ever wondered why knowledge editing in LLMs still feels more like a lab experiment than a real-world solution? In this post, we dive deep into where the research is thriving — and where…
Many thanks to AK for sharing our work! Introducing "ReCode: Updating Code API Knowledge with Reinforcement Learning" — the RL framework that teaches models to update code API knowledge. Paper: huggingface.co/papers/2506.20… Code: github.com/zjunlp/ReCode 📚 Trained on 2K+ API…
ReCode Updating Code API Knowledge with Reinforcement Learning
Meet Embodied Web Agents that bridge physical-digital realms. Imagine embodied agents that can search for online recipes, shop for ingredients and cook for you. Embodied web agents search internet information for implementing real-world embodied tasks. All data, codes and web…
Introducing AutoMind: Adaptive Knowledgeable Agent for Automated Data Science Paper: arxiv.org/abs/2506.10974 Code (will be released soon): github.com/innovatingAI/A… Our latest work AutoMind is a new LLM agent framework that automates end-to-end machine learning pipelines by…
🚨 New work: LLMs still struggle at Event Detection due to poor long-context reasoning and inability to follow task constraints, causing precision and recall errors. We introduce DiCoRe — a lightweight 3-stage Divergent-Convergent reasoning framework to fix this.🧵📷 (1/N)
The unreasonable effectiveness of model merging for cross-lingual transfer ! Our preprint evaluates a number of *modular* approaches to fine-tuning LLMs that "assign" model params to either task or language. Surprisingly, merging experts beats all ! 🧵1/4 arxiv.org/abs/2505.18356
🌏How culturally safe are large vision-language models? 👉LVLMs often miss the mark. We introduce CROSS, a benchmark of 1,284 image-query pairs across 16 countries & 14 languages, revealing how LVLMs violate cultural norms in context. ⚖️ Evaluation via CROSS-EVAL 🧨 Safety…
We introduce Reinforcing "Cognitive Experts" – a new approach to enhance reasoning in MoE-based Large Reasoning Models (LRMs) 🌟. Thanks to Tencent's support, we had the opportunity to explore the inner workings of ultra-large models like DeepSeek-R1-671B and Qwen3-235B. By…
Are MoE reasoning models already equipped with the right "brains" -- and just need a push? 🧠 Introducing Reinforcing Cognitive Experts (RICE), a simple, yet powerful inference-time approach that boosts reasoning accuracy by selectively strengthening just 2 cognitive experts in…
new paper! 🌱 Collapse of Dense Retrievers We uncover major vulnerabilities in dense retrievers like Contriever, showing they favor: 📌 Shorter docs 📌 Early positions 📌 Repeated entities 📌 Literal matches ...all while ignoring the answer's presence! huggingface.co/datasets/mohse…
🚀 Excited to introduce EasyEdit2 — a powerful upgrade to EasyEdit, now redesigned for unified, plug-and-play LLM behavior steering at inference time! youtu.be/AkfoiPfp5rQ?si…
🚀 Excited to introduce EasyEdit2 — a powerful upgrade to EasyEdit, now redesigned for unified, plug-and-play LLM behavior steering at inference time! #EasyEdit #LLM #ModelSteering #ModelEditing #KnowledgeEditing #EasyEdit2 #AI #InferenceTimeControl ✨ No retraining — just…
#GPT4o image generation brings synthetic visual data quality to the next level. 🖼️ 🤔Is synthetic visual data finally ready to be used for improving VLMs? 🚀 We show success with CoDA, using contrastive visual data augmentation to help teach VLMs novel and confusing concepts.
Excited to speak more about AI creativity at SSNLP today in Singapore ssnlp-website.github.io/ssnlp25/ Also look forward to hear what Qwen team has to say about their latest breakthrough! Friends in Singapore: let’s catch up!
📣 For this week’s NLP Seminar, we are thrilled to host Zhe Gan @zhegan4 to give a talk titled “How to Build Your Multimodal LLMs: From Pre-training to Post-training and Agents”! 🗓️ 4/11 Fri 2pm PT Registration: forms.gle/TNXfBZJiMJjL18…
Write a blog to share my recent thoughts about knowledge boundaries & tool use & language agent. This is the first time to propose three laws of knowledge boundaries!🔥 candle-walker-56d.notion.site/NAACL-2025-Ora… Chinese Version: mp.weixin.qq.com/s/XzjiLUFAr1Yc…
Introducing How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training 🔍🧠 Our latest work dives into the mechanism of new knowledge acquisition in LLMs, revealing how computational subgraphs— “knowledge circuits”—adapt and evolve during…
🍰Check out our latest work, CaKE on Knowledge Editing!
🍰 Introducing CaKE: Circuit-aware Knowledge Editing for LLMs! 🚀 Current knowledge editing methods update single facts but struggle with multi-hop reasoning. We propose CaKE to solve this by aligning edits with the model's reasoning pathways, enabling accurate and consistent…
Introducing LightThinker: Step-by-Step Compression for LLMs 🚀 LightThinker is a new method that enables Large Language Models (LLMs) to dynamically compress intermediate thoughts during reasoning, reducing memory overhead and computational costs while maintaining competitive…