h
hoagy
@HoagyCunningham
alignment attempter
London
Joined April 2022
262Following
455Followers
h
hoagy@HoagyCunningham · May 29
Super hyped about this.. circuits are still a WIP but there are probably thousands of novel mechanisms waiting to be discovered in these tools with just the right prompts and careful attention
Our interpretability team recently released research that traced the thoughts of a large language model. Now we’re open-sourcing the method. Researchers can generate “attribution graphs” like those in our study, and explore them interactively.
0
1
10
0
585