Myra Deng

@myra_deng

aligning models @goodfireAI, prev @stanford and @twosigma

Joined January 2018

121Following

1KFollowers

Pinned

Myra Deng@myra_deng · Jul 2

Has anyone used any good AI shopping assistants? I’ve tried @shopondaydream, deep research but couldn’t get either to work for me. Maybe this is a sign to stop buying clothes :/

2.0K

Myra Deng Retweeted

Zhengdong@zhengdongwang · Jul 10

I wrote some fiction in the style of AI 2027. It combines the parts of AI 2027 and AI as Normal Technology that resonate with me most. Come for the predictions, stay for the animal parables!

5.0K

Myra Deng@myra_deng · Jul 17

Just wrote a piece on why I believe interpretability is AI’s most important frontier - we're building the most powerful technology in history, but still can't reliably engineer or understand our models. With rapidly improving model capabilities, interpretability is more urgent,…

EEric Ho@ericho_goodfire · Jul 17

750

Myra Deng@myra_deng · Jul 10

we discovered the katy parity feature

GGoodfire@GoodfireAI · Jul 10

As it turns out, the parity feature from our CLT replication generalizes to Katy Perry lyrics

4.0K

Myra Deng@myra_deng · Jul 10

Brutal roast from this UMAP of language model latents

931

Myra Deng@myra_deng · Jul 4

happy July fourth from me and mine (my Llama SAE features)

4.0K

Myra Deng@myra_deng · Jul 3

I knew Claude was from New England

EErik Hoel@erikphoel · Jul 2

Always nice to hear more about Claude's personal life

1.0K

Myra Deng Retweeted

Goodfire@GoodfireAI · Jun 28

(1/7) New research: how can we understand how an AI model actually works? Our method, SPD, decomposes the *parameters* of neural networks, rather than their activations - akin to understanding a program by reverse-engineering the source code vs. inspecting runtime behavior.

788

603

98.0K

Myra Deng Retweeted

Lee Sharkey@leedsharkey · Jun 27

A few months ago, we published Attribution-based parameter decomposition -- a method for decomposing a network's parameters for interpretability. But it was janky and didn't scale. Today, we published a new, better algorithm called 🔶Stochastic Parameter Decomposition!🔶

178

105

15.0K

Myra Deng Retweeted

Goodfire@GoodfireAI · Jun 11

New research update! We replicated @AnthropicAI's circuit tracing methods to test if they can recover a known, simple transformer mechanism.

499

239

51.0K

Myra Deng@myra_deng · Jun 7

do you guys like my coffee table book

3.0K

Myra Deng Retweeted

Goodfire@GoodfireAI · May 30

"[Deep] unsupervised learning looked worse until suddenly it looked better. We think interpretability is likely to follow a similar arc... Each new item on the tech tree unlocks new questions, new ways to look at the problem. We're building toward the breakthrough"

6.0K

Myra Deng@myra_deng · May 27

the latent space is vast (so much larger than a text box!) and this lets you literally paint with it. i haven't felt quite this way since i saw melody interpolation with musicVAE. a good a time as any to announce im joining goodfire! i'm really excited about what we're working on

GGoodfire@GoodfireAI · May 27

We created a canvas that plugs into an image model’s brain. You can use it to generate images in real-time by painting with the latent concepts the model has learned. Try out Paint with Ember for yourself 👇

3.0K

Myra Deng Retweeted

mark bissell@MarkMBissell · May 13

painting > prompting excited for this to be public soon! more to share throughout the week

2.0K

Myra Deng@myra_deng · May 8

Liv please don’t subtweet me like this 😭

MMyra Deng@myra_deng · May 7

Liv please don’t subtweet me like this 😭

2.0K