Josh Engels

@JoshAEngels

PhD student @MIT. Working on mechanistic interpretability and AI safety.

Joined December 2021

117Following

1KFollowers

Pinned

Josh Engels@JoshAEngels · Apr 30

1/10: In our new paper, we develop scaling laws for scalable oversight: oversight and deception ability predictably scale as a function of LLM intelligence! We quantify scaling in four specific oversight settings and then develop optimal strategies for oversight bootstrapping.

DDavid D. Baek@dbaek__ · Apr 30

1/N 🚨Excited to share our new paper: Scaling Laws For Scalable Oversight! For the first time, we develop a theoretical framework for optimizing multi-level scalable oversight! We also make quantitative predictions for oversight success probability based on oversight simulations!

143

99.0K

Josh Engels Retweeted

Owain Evans@OwainEvans_UK · Jul 22

New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵

212

837

7.0K

4.0K

1.1M

Josh Engels@JoshAEngels · Jun 20

Many SAEs have a set of high frequency but seemingly uninterpretable latents. This is a mystery! Are these real LLM features, or a bug in SAEs? Our new work answers this question: dense latents correspond to true concepts like part of speech tags, entropy regulation, and binding!

llily (xiaoqing)@lilysun004 · Jun 20

1/9: Dense SAE Latents Are Features💡, Not Bugs🐛❌! In our new paper, we examine dense (ie. very frequently occuring) SAE latents. We find that dense latents are structured and meaningful, representing truly dense model signals.🧵

2.0K

Josh Engels Retweeted

Eric J. Michaud@ericjmichaud_ · May 22

Today, the most competent AI systems in almost *any* domain (math, coding, etc.) are broadly knowledgeable across almost *every* domain. Does it have to be this way, or can we create truly narrow AI systems? In a new preprint, we explore some questions relevant to this goal...

438

345

57.0K

Josh Engels Retweeted

Ziming Liu@ZimingLiu11 · May 17

Interested in the science of language models but tired of neural scaling laws? Here's a new perspective: our new paper presents neural thermodynamic laws -- thermodynamic concepts and laws naturally emerge in language model training! AI is naturAl, not Artificial, after all.

247

2.0K

1.0K

110.0K