Jacques
@JacquesThibs
Automating R&D safely and securing the future. 🇨🇦 Building something new.
From a Paul Christiano talk. Assume we're in a world where AI systems are broadly deployed and the world has become increasingly complex, where humans know less and less how things work. A viable strategy for AI takeover is to wait until there is certainty of success. If a 'bad…
Replicated that it doesn't work with prompting. You can't just stuff the sequences into the prompt. You need to fine-tune on the sequences to transmit the preference. It transmits when the weights are adjusted via gradient descent to better predict the teacher's biased outputs.
New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵
‼️ Seems like you can get more out of scaffolding than people give it credit for. Good sign for automated alignment research.
🚨 Olympiad math + AI: We ran Google’s Gemini 2.5 Pro on the fresh IMO 2025 problems. With careful prompting and pipeline design, it solved 5 out of 6 — remarkable for tasks demanding deep insight and creativity. The model could win gold! 🥇 #AI #Math #LLMs #IMO2025
My guess is that there are billion-dollar problems that may cost millions of inference-time compute to solve. The question is whether you can correctly identify them and know whether the AIs are progressing before you burn all your capital. High risk-reward.
My new blog post (🔗 below) explains some exciting new results, showing how goals can be modified to be corrigible and/or monitorable without hurting performance
I got a PR from the rouge @claudemini that runs free on it's own mac. O.o github.com/amantus-ai/vib…
If I’d have a company with multiple offices, I would add a big monitor that is always on in the lunch area so that it acts as a two-way portal to see and talk to each other