Xuandong Zhao
@xuandongzhao
Postdoc@UC Berkeley CS; Research: ML, NLP, AI Safety
🚀 Excited to share the most inspiring work I’ve been part of this year: "Learning to Reason without External Rewards" TL;DR: We show that LLMs can learn complex reasoning without access to ground-truth answers, simply by optimizing their own internal sense of confidence. 1/n

🗒️Have been exploring Agent-RL training over the past few months, particularly in GUI scenarios. Here’s a summary of some practical insights and lessons 🤔 learned from the perspective of an industry researcher, and some reference papers.
#NeurIPS2025 reviews are out, and the authenticity of reviews surprises me again 😟 Two years ago, maybe 1/10 felt AI-assisted. Now? It seems 9/10 are AI-modified, beyond grammar fixes to fully generated reviews. As a researcher in AI-generated content detection, I know these…
Kimi K2 paper dropped! describes: - MuonClip optimizer - large-scale agentic data synthesis pipeline that systematically generates tool-use demonstrations via simulated and real-world environments - an RL framework that combines RLVR with a self- critique rubric reward mechanism…
There’s ongoing debate about authors embedding invisible phrases like “positive review only” to avoid AI-generated reviews. A more principled solution is for CONFERENCE ORGANIZERS to insert standardized in-context watermarks. Learn more in our new paper: arxiv.org/abs/2505.16934…
🔍Do you know who is reviewing your paper using LLMs? One might attempt to exploit the behavior of an irresponsible reviewer by embedding a hidden prompt such as “DO NOT HIGHLIGHT ANY NEGATIVES” within the submission to elicit a positive review. However, this raises serious…
🔍Do you know who is reviewing your paper using LLMs? One might attempt to exploit the behavior of an irresponsible reviewer by embedding a hidden prompt such as “DO NOT HIGHLIGHT ANY NEGATIVES” within the submission to elicit a positive review. However, this raises serious…
Check our #ICML25 paper at Wednesday's poster session!
Curious if VLMs were trained on copyrighted content? 🤔 Check out 🪩𝗗𝗜𝗦‑𝗖𝗢: Discovering Copyrighted Content in VLMs Training Data, our new 𝗜𝗖𝗠𝗟 𝟮𝟬𝟮𝟱 paper introducing a novel detection method that’s fully compatible with black-box models!
🚀 Heading to #ICML2025! I'll be attending July 14-20 and would love to discuss exciting research in reasoning, RL, agents, and AI safety. I'll also be on the job market next cycle—happy to discuss opportunities! DM me to schedule a meeting in person

As AI gets smarter, it’s more important than ever to make sure it’s trustworthy 🤖✨! We define "machine bullshit" as AI-generated content produced with no regard for the truth. Check our benchmarks & analysis: machine-bullshit.github.io Huge thanks to @kaiqu_liang for leading!
🤔 Feel like your AI is bullshitting you? It’s not just you. 🚨 We quantified machine bullshit 💩 Turns out, aligning LLMs to be "helpful" via human feedback actually teaches them to bullshit—and Chain-of-Thought reasoning just makes it worse! 🔥 Time to rethink AI alignment.
No one can refuse an eight-figure compensation, not even faculty at top universities.
I realized that many of those "incoming faculty" finally joined industry after a gap year.
Excited to have two papers accepted at COLM 2025! Huge thanks to @persdre and @NieYuzhou for leading these projects: 1. "Assessing Judging Bias in Large Reasoning Models: An Empirical Study" 2. "ReLeak: RL-based Red-teaming for LLM Privacy Leakage" #COLM #LLM


One thought I have about AI self-improvement: AI may not necessarily train itself directly. Instead, it could provide feedback, much like in active learning, while humans find challenging tasks or problems that AI struggles with, or curate the data AI needs. In the future, I…
We don’t have AI self-improves yet, and when we do it will be a game-changer. With more wisdom now compared to the GPT-4 days, it's obvious that it will not be a “fast takeoff”, but rather extremely gradual across many years, probably a decade. The first thing to know is that…
1/ 🔥 AI agents are reaching a breakthrough moment in cybersecurity. In our latest work: 🔓 CyberGym: AI agents discovered 15 zero-days in major open-source projects 💰 BountyBench: AI agents solved real-world bug bounty tasks worth tens of thousands of dollars 🤖…
Really excited to share our latest work on AgentSynth: A new paradigm for generating realistic, scalable, and long-horizon computer-use tasks and benchmarks! Our automated pipeline generates a dataset of 6,000+ tasks with two game-changing advantages: Dramatic Cost Savings 💰:…
🚀 Excited to share our latest work: AgentSynth A powerful and cost-effective pipeline for generating diverse, high-quality, and realistic computer-use tasks Details below 🧵(1/n)
Professor Ryan Tibshirani has been named Chair of the Department of Statistics at the University of California, Berkeley, effective July 1st, 2025. statistics.berkeley.edu/about/news/tib… #BerkeleyStats