Ningyu Zhang@ZJU
@zxlzr
Associate Professor @ZJU_China. Research interests include NLP, LLM, KG, Agent, Knowledge Editing.
🚀 This year, we’ve rolled out a series of updates to EasyEdit1—and dropped EasyEdit2 to steer LLM behavior on the fly! 🔧✨ 👉 Code: github.com/zjunlp/EasyEdit What’s new? • Datasets: Integrated AKEW, LEME & UNKE • Methods: NAMET, CORE, UNKE, AnyEdit & Reference-free Preference…


This article from @TheEconomist offers an accurate overview of key dynamics shaping the development of AI today: the risks of the rapid race toward AGI and ASI, the challenges posed by open-sourcing frontier models, the deep uncertainty revealed by ongoing scientific debates and…
Due to a scheduling conflict, I won’t be able to attend #ACL2025 in person. Our group will be presenting the following works—feel free to connect and chat with our team members at the conference! Main conferences: Beyond Prompt Engineering: Robust Behavior Control in LLMs via…

Thrilled to introduce "𝗗𝗲𝗲𝗽 𝗥𝗲𝘀𝗲𝗮𝗿𝗰𝗵𝗲𝗿 𝘄𝗶𝘁𝗵 𝗧𝗲𝘀𝘁-𝗧𝗶𝗺𝗲 𝗗𝗶𝗳𝗳𝘂𝘀𝗶𝗼𝗻," a new deep research agent designed to mimic the iterative nature of human research, complete with cycles of planning, drafting, and revision. 🚀🚀 arxiv.org/pdf/2507.16075
Do you find RL makes the LLM reasoning more stubborn? Keep repeating the same answers? How to make multi-turn conversational history be helpful in RL training? We identify a simple "try again" feedback can boost reasoning and make RL training a conversational manner!…
Will conversation history help reasoning? We found that when models mess up once, they often get stuck. Surprisingly, a simple “try again” fixes this — and boosts reasoning.🧵 Project Page: unary-feedback.github.io
New Anthropic Research: “Inverse Scaling in Test-Time Compute” We found cases where longer reasoning leads to lower accuracy. Our findings suggest that naïve scaling of test-time compute may inadvertently reinforce problematic reasoning patterns. 🧵
WHY do you prefer something over another? Reward models treat preference as a black-box😶🌫️but human brains🧠decompose decisions into hidden attributes We built the first system to mirror how people really make decisions in our #COLM2025 paper🎨PrefPalette✨ Why it matters👉🏻🧵
🚨 The era of infinite internet data is ending, So we ask: 👉 What’s the right generative modelling objective when data—not compute—is the bottleneck? TL;DR: ▶️Compute-constrained? Train Autoregressive models ▶️Data-constrained? Train Diffusion models Get ready for 🤿 1/n
Multimodal models still leak harmful text when attackers mix tricky words and images. AutoSteer adds a lightweight layer-aware safety prober and refusal head to frozen multimodal LLMs, driving attack success below 5% while leaving regular performance unchan A safety awareness…
A simple AGI safety technique: AI’s thoughts are in plain English, just read them We know it works, with OK (not perfect) transparency! The risk is fragility: RL training, new architectures, etc threaten transparency Experts from many orgs agree we should try to preserve it:…
What happend after Dream 7B? First, Dream-Coder 7B: A fully open diffusion LLM for code delivering strong performance, trained exclusively on public data. Plus, DreamOn cracks the variable-length generation problem! It enables code infilling that goes beyond a fixed canvas.
Mechanistic interpretability often relies on *interventions* to study how DNNs work. Are these interventions enough to guarantee the features we find are not spurious? No!⚠️ In our new paper, we show many mech int methods implicitly rely on the linear representation hypothesis🧵
GDM interp work: Do LLMs have self-preservation? Concerning recent work: models may block shutdown if it interferes with the task? But we found the model was just confused: if told to prioritize shut down *over* the task it complies 100% And we only needed black box methods!
🤯 Get ready for #ACL2025NLP! featuring 3500+ paper presentations (talks & posters!), numerous workshops, several tutorials, insightful keynotes, and engaging panels? 📚🎤💡 Deep dive into the latest in #NLProc! Check out the full program here: 2025.aclweb.org/program/