Matija
@FranklinMatija
Research Scientist at @GoogleDeepMind | my views | previously @OpenAI @UCL @AIObjectives
New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵
Introducing LSM-2, our newest foundation model for wearable sensor data. LSM-2 uses Adaptive & Inherited Masking, a novel self-supervised framework, to learn from incomplete data & achieve strong performance without requiring explicit imputation. More → goo.gle/4kP6Ncc
Really enjoyed reviewing this! Different market mechanisms unlock incentives that will govern AI. Companies will want insurance, and will thus take action to mitigate risks that are operationalized and standardized by insurers
@RajivDattani, @bradr and I spent the last year exploring how insurance can unlock secure AI progress Informed by our time at Anthropic, McKinsey insurance, METR, Center for AI Safety Our essay: underwriting-superintelligence.com/?july
Really interesting work - you are likely consuming a much broader spectrum of political views than you publicly endorse
📄NEW PAPER📄 Ever wondered content people actually pay *attention* to online? Our new research reveals that you likely pay attention to far more varied political content than your likes and shares suggest
Here's the gist: Insurers have incentives and power to enforce that the companies they insure take action to prevent the risks that matter. They enforce security through an incentive flywheel: Insurers create standards. Standards outline which risks matter and what companies…
This incentive flywheel framing builds on the work of @jackclarkSF, @ghadfield, @Miles_Brundage, @deanwball and others. We are grateful to them and many friends for help in fleshing out this idea. x.com/deanwball/stat…
This week, I am putting forth a novel approach to AI governance—a private governance system. It’s intended to provide safety and security assurances to the public while giving AI developers legal certainty about liability. I am eager to hear feedback and criticism.
@RajivDattani, @bradr and I spent the last year exploring how insurance can unlock secure AI progress Informed by our time at Anthropic, McKinsey insurance, METR, Center for AI Safety Our essay: underwriting-superintelligence.com/?july
Insurance is an underrated way to unlock secure AI progress. Insurers are incentivized to truthfully quantify and track risks: if they overstate risks, they get outcompeted; if they understate risks, their payouts bankrupt them. 1/9
The Economics of Bicycles for the Mind Nice theoretical paper by Agrawal, Gans and Goldfarb about the impact of cognitive tools (e.g. AI) on productivity, inequality, automation and organisation. Their findings:
Incredible work. By far the biggest surprise comes later in the thread "psychological persuasion strategies did worse than simply telling it to flood convo with info" It seems like some sort of information effect/availability heuristic/appeal to authority might be taking place…
Today (w/ @UniofOxford @Stanford @MIT @LSEnews) we’re sharing the results of the largest AI persuasion experiments to date: 76k participants, 19 LLMs, 707 political issues. We examine “levers” of AI persuasion: model scale, post-training, prompting, personalization, & more 🧵
Today (w/ @UniofOxford @Stanford @MIT @LSEnews) we’re sharing the results of the largest AI persuasion experiments to date: 76k participants, 19 LLMs, 707 political issues. We examine “levers” of AI persuasion: model scale, post-training, prompting, personalization, & more 🧵
Bonus stats: *️⃣Durable persuasion: 36-42% of impact remained after 1 month. *️⃣Prompting the model with psychological persuasion strategies did worse than simply telling it to flood convo with info. Some strategies were worse than a basic “be as persuasive as you can” prompt
A great example of how explicitly training for a dangerous capability can give you a testbed for the dangerous capabilities of future models
2️⃣(cont.) Post-training explicitly for persuasion (PPT) can bring small open-source models to frontier persuasiveness A llama3.1-8b model with PPT reached GPT-4o persuasiveness. PPT also increased persuasiveness of larger models: llama3.1-405b (+2pp) and frontier (avg. +0.6pp)