David D. Baek
@dbaek__
PhD Student @ MIT EECS / Mechanistic Interpretability, Scalable Oversight
BREAKING: Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well. Here's what Apple discovered: (hint: we're not as close to AGI as the hype suggests)
Today, the most competent AI systems in almost *any* domain (math, coding, etc.) are broadly knowledgeable across almost *every* domain. Does it have to be this way, or can we create truly narrow AI systems? In a new preprint, we explore some questions relevant to this goal...
Interested in the science of language models but tired of neural scaling laws? Here's a new perspective: our new paper presents neural thermodynamic laws -- thermodynamic concepts and laws naturally emerge in language model training! AI is naturAl, not Artificial, after all.
1/14: If sparse autoencoders work, they should give us interpretable classifiers that help with probing in difficult regimes (e.g. data scarcity). But we find that SAE probes consistently underperform! Our takeaway: mech interp should use stronger baselines to measure progress 🧵
(1/N) LLMs represent numbers on a helix? And use trigonometry to do addition? Answers below 🧵