David D. Baek

@dbaek__

PhD Student @ MIT EECS / Mechanistic Interpretability, Scalable Oversight

Cambridge, MA

Joined February 2024

31Following

2KFollowers

David D. Baek Retweeted

Ruben Hassid@RubenHssd · Jun 7

BREAKING: Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well. Here's what Apple discovered: (hint: we're not as close to AGI as the hype suggests)

3.0K

10.0K

65.0K

43.0K

14.1M

David D. Baek Retweeted

Eric J. Michaud@ericjmichaud_ · May 22

Today, the most competent AI systems in almost *any* domain (math, coding, etc.) are broadly knowledgeable across almost *every* domain. Does it have to be this way, or can we create truly narrow AI systems? In a new preprint, we explore some questions relevant to this goal...

438

345

57.0K

David D. Baek Retweeted

Ziming Liu@ZimingLiu11 · May 17

Interested in the science of language models but tired of neural scaling laws? Here's a new perspective: our new paper presents neural thermodynamic laws -- thermodynamic concepts and laws naturally emerge in language model training! AI is naturAl, not Artificial, after all.

247

2.0K

1.0K

110.0K

David D. Baek Retweeted

Josh Engels@JoshAEngels · Feb 25

1/14: If sparse autoencoders work, they should give us interpretable classifiers that help with probing in difficult regimes (e.g. data scarcity). But we find that SAE probes consistently underperform! Our takeaway: mech interp should use stronger baselines to measure progress 🧵

520

388

67.0K

David D. Baek Retweeted

Subhash Kantamneni@thesubhashk · Feb 5

(1/N) LLMs represent numbers on a helix? And use trigonometry to do addition? Answers below 🧵

160

944

782

213.0K