Jascha Sohl-Dickstein

@jaschasd

Member of the technical staff @ Anthropic. Most (in)famous for inventing diffusion models. AI + physics + neuroscience + dynamics.

San Francisco

Joined August 2009

701Following

23KFollowers

Pinned

Jascha Sohl-Dickstein@jaschasd · Nov 7, 2022

My first blog post ever! Be harsh, but, you know, constructive. Too much efficiency makes everything worse: overfitting and the strong version of Goodhart's law sohl-dickstein.github.io/2022/11/06/str… 🧵

jaschasd's tweet image. My first blog post ever! Be harsh, but, you know, constructive.

Too much efficiency makes everything worse: overfitting and the strong version of Goodhart's law
sohl-dickstein.github.io/2022/11/06/str…

🧵

187

987

387

Pinned

Jascha Sohl-Dickstein@jaschasd · Apr 6, 2024

This was a fun project! If you could train an LLM over text arithmetically compressed using a smaller LLM as a probabilistic model of text, it would be really good. Text would be represented with far fewer tokens, and inference would be way faster and cheaper. The hard part is…

NNoah Constant@noahconst · Apr 5, 2024

Ever wonder why we don’t train LLMs over highly compressed text? Turns out it’s hard to make it work. Check out our paper for some progress that we’re hoping others can build on. arxiv.org/abs/2404.03626 With @blester125, @hoonkp, @alemi, Jeffrey Pennington, @ada_rob, @jaschasd

103

25.0K

Jascha Sohl-Dickstein@jaschasd · Jul 7

I will be attending ICML next week. Reach out (by email) if you'd like to chat! About Anthropic / research / life. I'm especially interested in meeting grad students who can teach me new research ideas.

273

28.0K

Jascha Sohl-Dickstein@jaschasd · Apr 15

This is great, hearing Yang's thought process and motivations for his score matching/diffusion research. (I had forgotten that I tried to convince him that score matching was too local to be useful for generative modeling :/)

SSlater Stich@slaterstich · Apr 14

Very excited to share our interview with @DrYangSong. This is Part 2 of our history of diffusion series — score matching, the SDE/ODE interpretation, consistency models, and more. Enjoy!

112

15.0K

Jascha Sohl-Dickstein@jaschasd · Feb 11

Slater is an excellent interviewer. This was a lot of fun to do. I'm even more excited for the upcoming interviews with @DrYangSong and @sedielem !

SSlater Stich@slaterstich · Feb 10

Very excited to share our interview with @jaschasd on the history of diffusion models — from his original 2015 paper inventing them, to the GAN "ice age", to the resurgence in diffusion starting with DDPM. Enjoy!

12.0K

Jascha Sohl-Dickstein@jaschasd · Jul 16, 2024

This is an excellent paper, that ties many threads together around scaling models and hyperparameters.

T@ ·

9.0K

Jascha Sohl-Dickstein@jaschasd · Jun 7, 2024

This was one of the most research-enabling libraries I used at Google. If you want to try out LLM ideas with a simple, clean, JAX codebase, this is for you.

PPeter J. Liu@peterjliu · Jun 5, 2024

We recently open-sourced a relatively minimal implementation example of Transformer language model training in JAX, called NanoDO. If you stick to vanilla JAX components, the code is relatively straightforward to read -- the model file is <150 lines. We found it useful as a…

14.0K

Jascha Sohl-Dickstein Retweeted

Tristan Hume@trishume · Apr 2, 2024

Here's Claude 3 Haiku running at >200 tokens/s (>2x as fast as prod)! We've been working on capacity optimizations but we can have fun testing those as speed optimizations via overly-costly low batch size. Come work with me at Anthropic on things like this, more info in thread 🧵

439

172

88.0K

Jascha Sohl-Dickstein@jaschasd · Feb 3, 2024

I’ve been daydreaming about an AI+audio product that I think recently became possible: virtual noise canceling headphones. I hate loud background noise -- BART trains, airline cabins, road noise, ... 🙉. I would buy the heck out of this product, and would love it if it were built…

18.0K

Jascha Sohl-Dickstein@jaschasd · Dec 12, 2023

An excellent project making evolution strategies much more efficient for computing gradients in dynamical systems.

OOscar Li@OscarLi101 · Dec 12, 2023

📝Quiz time: when you have an unrolled computation graph (see figure below), how would you compute the unrolling parameters' gradients? If your answer only contains Backprop, now it’s time to add a new method to your gradient estimation toolbox!

11.0K

Jascha Sohl-Dickstein Retweeted

Max Bileschi@mlbileschi_pub · Nov 15, 2023

2+2=5? “LLMs are not Robust to Adversarial Arithmetic” a new paper from our team @GoogleDeepMind with @bucketofkets, @culpla, @AlwaysParisi, @gamaleldinfe, @jaschasd, Noah Fiedel TLDR: We ask an LLM to attack itself and find this works extremely well.

11.0K