Ed Turner (@EdTurner42)

Pinned

E

Ed Turner@EdTurner42 · 19 h

The dataset you’re about to casually share could ruin later experiments… Think about wrapping it to avoid contaminating future models, we see issues arising on this already (it’s hard to test a mysterious behaviour if the model knows all about your test) Feel free to…

AAlex Turner@Turn_Trout · 19 h

We made a simple tool to help protect your dataset from being trained on. Within 30 mins and for $0, you can set up a Turnstile-protected download portal with canaries reversibly inserted into your data. Helps reduce training leakage. (1/n) turntrout.com/dataset-protec…

0

3

0

321

Ed Turner Retweeted

P

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭@elder_plinius · Jul 24

we discovered alien intelligence in sand and like 1% of the world cares lol

1.0K

41.0K

8.0K

5.1M

Ed Turner Retweeted

H

Helena Casademunt@HCasademunt · Jul 23

Problem: Train LLM on insecure code → it becomes broadly misaligned Solution: Add safety data? What if you can't? Use interpretability! We remove misaligned concepts during finetuning to steer OOD generalization We reduce emergent misalignment 10x w/o modifying training data

9

27

150

70

24.0K

Ed Turner Retweeted

C

Chris Schnabl@inxoy_ · Jul 21

CS 2881 by @boazbaraktcs is the University course I'm most excited about in a while. Even better it features @EdTurner42 and @NeelNanda5 paper about Emergent Misalignment. Anyone interested in AI Safety should follow along. windowsontheory.org/2025/07/20/ai-…

2

4

39

27

5.0K

Ed Turner Retweeted

A

Anna@anna_soligo · Jul 19

@EdTurner42 and I are at ICML today presenting our posters on Emergent Misalignment! Come find us at the Actionable Interpretability Workshop and the R2FM Workshop. T-shirt creds to @NeelNanda5 :)

4

2

53

4

9.0K

Ed Turner Retweeted

S

Samuel Marks@saprmarks · Jul 13

xAI launched Grok 4 without any documentation of their safety testing. This is reckless and breaks with industry best practices followed by other major AI labs. If xAI is going to be a frontier AI developer, they should act like one. 🧵

273

250

3.0K

826

658.0K

E

Ed Turner@EdTurner42 · Jun 18

Really awesome to see Ed and Anna's work on emergent misalignment covered in MIT Tech Review, alongside OpenAI's great new paper

EEd Turner@EdTurner42 · Jun 16

1/8: The Emergent Misalignment paper showed LLMs trained on insecure code then want to enslave humanity...?! We're releasing two papers exploring why! We: - Open source small clean EM models - Show EM is driven by a single evil vector - Show EM has a mechanistic phase transition

2

13

120

24

11.0K

E

Ed Turner@EdTurner42 · Jun 17

Oh, and my favourite part of this project is that Ed and Anna found the core results in a two week sprint!

EEd Turner@EdTurner42 · Jun 16

1/8: The Emergent Misalignment paper showed LLMs trained on insecure code then want to enslave humanity...?! We're releasing two papers exploring why! We: - Open source small clean EM models - Show EM is driven by a single evil vector - Show EM has a mechanistic phase transition

0

4

90

35

8.0K

E

Ed Turner@EdTurner42 · Jun 16

Excited to have supervised these papers! EM was wild, with unclear implications for safety We answer how: there's a general evil vector. Boosting this is A solution to SFT on any narrow evil task We don't know WHY it's so general, but release better EM models to boost research

EEd Turner@EdTurner42 · Jun 16

1/8: The Emergent Misalignment paper showed LLMs trained on insecure code then want to enslave humanity...?! We're releasing two papers exploring why! We: - Open source small clean EM models - Show EM is driven by a single evil vector - Show EM has a mechanistic phase transition

9

15

189

96

17.0K