Yong Zheng-Xin (Yong)
@yong_zhengxin
reasoning and safety @BrownCSDept || ex intern/collab @AIatMeta @Cohere_Labs || master of em dash––http://yongzx.substack.com
To summarize this week: - we released general purpose computer using agent - got beaten by a single human in atcoder heuristics competition - solved 5/6 new IMO problems with natural language proofs All of those are based on the same single reinforcement learning system
I can't believe I'm saying it but "mechahitler" is the smallest problem: * There is no system card, no information about any safety or dangerous capability evals. * Unclear if any safety training was done. Model offers advice chemical weapons, drugs, or suicide methods. * The…
Modern reasoning models think in plain English. Monitoring their thoughts could be a powerful, yet fragile, tool for overseeing future AI systems. I and researchers across many organizations think we should work to evaluate, preserve, and even improve CoT monitorability.
regardless of whether you want to retroactively call these methods neurosymbolic, it was absolutely not the neurosymbolic people who pioneered them they were pioneered by the bitterlessonpilled people
I am starting to think sycophancy is going to be a bigger problem than pure hallucination as LLMs improve. Models that won’t tell you directly when you are wrong (and justify your correctness) are ultimately more dangerous to decision-making than models that are sometimes wrong.
Albert’s blog post that describes the background, first principles/intuitions and technical details is just, wow. yeap tokenization has been broken - let the model figure out how to automatically assign intelligence per FLOPs everyone should read it: goombalab.github.io/blog/2025/hnet…
This was an incredibly important project to me - I’ve wanted to solve it for years, but had no idea how. This was all @sukjun_hwang and @fluorane's amazing work! I wrote about the story of its development, and what might be coming next. The H-Net: goombalab.github.io/blog/2025/hnet…
The key idea is this: we should remove the sequential structure of **pretraining -> RL**. Just like how humans interplay passive learning (absorbing what tokens come next) and active learning (getting feedback from interactions) in an unstructured fashion.
I wrote up this post about how we should **unify RL and next-token-prediction** based on my perspective how humans learn new languages. then realize @jxmnop wrote the exact same thing about how we should scale RL to 10^26 FLOPs
We got a call from @xai 24 hours ago “We want to test Grok 4 on ARC-AGI” We heard the rumors. We knew it would be good. We didn’t know it would become the #1 public model on ARC-AGI Here’s the testing story and what the results mean: Yesterday, we chatted with Jimmy from the…
Grok 4 (Thinking) achieves new SOTA on ARC-AGI-2 with 15.9% This nearly doubles the previous commercial SOTA and tops the current Kaggle competition SOTA
Is CoT monitoring a lost cause due to unfaithfulness? 🤔 We say no. The key is the complexity of the bad behavior. When we replicate prior unfaithfulness work but increase complexity—unfaithfulness vanishes! Our finding: "When Chain of Thought is Necessary, Language Models…
Why you should stop working on RL research and instead work on product // The technology that unlocked the big scaling shift in AI is the internet, not transformers I think it's well known that data is the most important thing in AI, and also that researchers choose not to work…
RLHF-aligned LMs excel at long-form generation, but how? We show how current models rely on anchor spans ⚓: strings that occur across many samples for the same prompt, forming an implicit outline, viz below.
Recently, there has been a lot of talk of LLM agents automating ML research itself. If Llama 5 can create Llama 6, then surely the singularity is just around the corner. How can we get a pulse check on whether current LLMs are capable of driving this kind of total…
We also conducted probably the most detailed error analysis of agentic search systems to date. Many common issues in current systems: - Laziness, cannot follow a task all the way through - Hallucination, fabricate citation links or plausible-sounding answer not supported by the…