Mikhail Terekhov

@MiTerekhov

PhD in ML @ CLAIRE lab, EPFL. MATS 7.1. AI Control.

Lausanne

Joined January 2022

182Following

121Followers

Pinned

AI Control is a promising approach for mitigating misalignment risks, but will it be widely adopted? The answer depends on cost. Our new paper introduces the Control Tax—how much does it cost to run the control protocols? (1/8) 🧵

MiTerekhov's tweet image. AI Control is a promising approach for mitigating misalignment risks, but will it be widely adopted? The answer depends on cost. Our new paper introduces the Control Tax—how much does it cost to run the control protocols? (1/8) 🧵

11.0K

Mikhail Terekhov@MiTerekhov · Jul 24

Well, to avoid steganography, let's make sure our multi-agent LLM research workflows are composed of agents with different base models then

OOwain Evans@OwainEvans_UK · Jul 22

New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵

353

Mikhail Terekhov Retweeted

Skander Moalla@SkanderMoalla · Jul 14

🚀 Big time! We can finally do LLM RL fine-tuning with rewards and leverage offline/off-policy data! ❌ You want rewards, but GRPO only works online? ❌ You want offline, but DPO is limited to preferences? ✅ QRPO can do both! 🧵Here's how we do it:

136

153

19.0K

Mikhail Terekhov Retweeted

gavin leech@g_leech_ · Jun 26

Michel Foucault is thought to be the world's most cited academic, with >1,440,000 citations. But Geoffrey Hinton has been catching up, accelerating where Foucault is decelerating in the last 5 years. When will Hinton overtake Foucault - when is the Moment of Hintotality?

168

14.0K

Mikhail Terekhov Retweeted

Chris Wendler@wendlerch · Jun 12

How do diffusion models create images and can we control that process? We are excited to release a update to our SDXL Turbo sparse autoencoder paper. New title: One Step is Enough: Sparse Autoencoders for Text-to-Image Diffusion Models Spoiler: We have FLUX SAEs now :)

5.0K

Mikhail Terekhov@MiTerekhov · May 29

a day in life > wake up > OpenBrain new model > update arxiv draft "we test SotA models" to "advanced models" > rinse and repeat

191

Mikhail Terekhov@MiTerekhov · Apr 1

A lot of the reading drive for me comes from sparse reinforcement. Occasionally I find a passage that's so good it justifies the whole endeavor. However, twitter gave me a negative connotation for the same feeling of discovery. Now every time I read I have a cognitive dissonance.

198

Mikhail Terekhov Retweeted

Dr. Dave Venable@davevenable · Mar 26

Cryptography 101

780

11.0K

587

343.0K

Mikhail Terekhov@MiTerekhov · Mar 25

goalposts so fast Einstein is in shambles

FFrançois Chollet@fchollet · Mar 24

Today, we're releasing ARC-AGI-2. It's an AI benchmark designed to measure general fluid intelligence, not memorized skills – a set of never-seen-before tasks that humans find easy, but current AI struggles with. It keeps the same format as ARC-AGI-1, while significantly…

183

Mikhail Terekhov@MiTerekhov · Jan 31

trade-offs trade-offs...

166