Chris Painter

@ChrisPainterYup

head of policy @METR_Evals | evals accelerationist, working hard on responsible scaling policies

Joined November 2017

1KFollowing

2KFollowers

Pinned

Chris Painter@ChrisPainterYup · Apr 16

When should AI companies publish system cards? I want to make the case that the ideal system would involve something closer to quarterly reporting, rather than focusing so much on deployment. Sharing here to get pushback and debate🧵

9.0K

Pinned

Chris Painter@ChrisPainterYup · Jul 12

I was one of the 16 devs in this study. I wanted to speak on my opinions about the causes and mitigation strategies for dev slowdown. I'll say as a "why listen to you?" hook that I experienced a -38% AI-speedup on my assigned issues. I think transparency helps the community.

MMETR@METR_Evals · Jul 10

We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.

102

464

4.0K

3.0K

1.8M

Chris Painter@ChrisPainterYup · 21 h

The billion $ offers for senior AI talent story is an extension of the don’t-call-it-an-acquisition acquisitions story. If in 2 years you’ll have to pay billions to acquire startups mostly for the talent and not the IP, why not skip straight to the end and make the offer now?

292

Chris Painter@ChrisPainterYup · Jul 17

At last: time to exploit

SStefan Schubert@StefanFSchubert · Jul 17

The decline of search

441

Chris Painter Retweeted

Mikita Balesni 🇺🇦@balesni · Jul 15

A simple AGI safety technique: AI’s thoughts are in plain English, just read them We know it works, with OK (not perfect) transparency! The risk is fragility: RL training, new architectures, etc threaten transparency Experts from many orgs agree we should try to preserve it:…

102

415

239

194.0K

Chris Painter@ChrisPainterYup · Jul 14

METR previously estimated that the time horizon of AI agents on software tasks is doubling every 7 months. We have now analyzed 9 other benchmarks for scientific reasoning, math, robotics, computer use, and self-driving; we observe generally similar rates of improvement.

MMETR@METR_Evals · Mar 19

When will AI systems be able to carry out long projects independently? In new research, we find a kind of “Moore’s Law for AI agents”: the length of tasks that AIs can do is doubling about every 7 months.

121

647

277

205.0K

Chris Painter@ChrisPainterYup · Jul 14

It's sad that tons of low quality engagement on Twitter often pushes a lot of thorough debate/conversation into DMs. Introduces a huge selection effect in what quality of debate people who only get to watch discourse on Twitter see. But hard to see how it could be different.

304

Chris Painter@ChrisPainterYup · Jul 12

You might be interested in this other writeup, which was published during the study but before the findings were shared. Gives you a lens into how the developers felt about the experience without the bias of knowing the results.

WWell-Typed@welltyped · Apr 11

New blog post: Evaluating AI's Impact on Haskell Open Source Development well-typed.com/blog/2025/04/a…

4.0K

Chris Painter Retweeted

Nikola Jurkovic@nikolaj2030 · Jul 11

I'm a @METR_evals researcher evaluating Grok 4 on our time horizon benchmark. As an experiment, I'll try live-tweeting in this thread as I conduct the eval! This is all raw impressions. Please don't take it too seriously.

337

44.0K

Chris Painter@ChrisPainterYup · Jul 11

The devastating effects of these cuts are entirely preventable—and it’s not too late to reverse them.

SSam Stein@samstein · Jul 10

An HIV doctor in Africa, whose work depends on USAID and PEPFAR, sends in a dispatch of the bleak situation now unfolding there thebulwark.com/p/a-religious-…

1.0K

963

4.0K

279

1.2M

Chris Painter@ChrisPainterYup · Jul 10

strong recommend @snewmanpv 's write-up on our results. (and his commentary on AI more generally!) it's hard to put into words just how generous steve was with feedback; he has thought about this deeply. x.com/snewmanpv/stat…

SSteve Newman@snewmanpv · Jul 10

How much time do AI coding tools save? @METR_Evals just released a rigorous study with a startling result: developers take 19% longer to complete tasks when using AI! The result is consistent with the idea that AI tools are most helpful for routine work in small projects,…

947

Chris Painter Retweeted

Chris Painter@ChrisPainterYup · Jul 9

I think the field of AI dangerous capability evaluations is moving increasingly away from "run these benchmarks" and toward "do this capital intensive randomized control trial" This is harder to standardize, harder to ask every developer to do, harder to do often

928