Jacques

@JacquesThibs

Automating R&D safely and securing the future. 🇨🇦 Building something new.

San Francisco, CA

Joined May 2008

1KFollowing

4KFollowers

Pinned

Jacques@JacquesThibs · May 2, 2024

From a Paul Christiano talk. Assume we're in a world where AI systems are broadly deployed and the world has become increasingly complex, where humans know less and less how things work. A viable strategy for AI takeover is to wait until there is certainty of success. If a 'bad…

22.0K

Jacques@JacquesThibs · 1 h

Replicated that it doesn't work with prompting. You can't just stuff the sequences into the prompt. You need to fine-tune on the sequences to transmit the preference. It transmits when the weights are adjusted via gradient descent to better predict the teacher's biased outputs.

OOwain Evans@OwainEvans_UK · Jul 22

New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵

211

Jacques@JacquesThibs · 12 h

🪷☸️

227

Jacques@JacquesThibs · 12 h

Might try to finish this

265

Jacques@JacquesThibs · 23 h

‼️ Seems like you can get more out of scaffolding than people give it credit for. Good sign for automated alignment research.

LLin Yang@lyang36 · Jul 22

🚨 Olympiad math + AI: We ran Google’s Gemini 2.5 Pro on the fresh IMO 2025 problems. With careful prompting and pipeline design, it solved 5 out of 6 — remarkable for tasks demanding deep insight and creativity. The model could win gold! 🥇 #AI #Math #LLMs #IMO2025

293

Jacques@JacquesThibs · Jul 21

My guess is that there are billion-dollar problems that may cost millions of inference-time compute to solve. The question is whether you can correctly identify them and know whether the AIs are progressing before you burn all your capital. High risk-reward.

237

Jacques Retweeted

Rubi Hudson@undo_hubris · Jul 20

My new blog post (🔗 below) explains some exciting new results, showing how goals can be modified to be corrigible and/or monitorable without hurting performance

889

Jacques Retweeted

Peter Steinberger@steipete · Jul 20

I got a PR from the rouge @claudemini that runs free on it's own mac. O.o github.com/amantus-ai/vib…

5.0K

Jacques@JacquesThibs · Jul 20

If I’d have a company with multiple offices, I would add a big monitor that is always on in the lunch area so that it acts as a two-way portal to see and talk to each other

334