Elizabeth Barnes

@BethMayBarnes

Joined July 2014

380Following

3KFollowers

Elizabeth Barnes@BethMayBarnes · Jul 10

This study surprised me! The conclusion is opposite to what I would expect. It is tempting to try to find a reason it's bogus but I think it's well executed and solid work. As the authors say, there are a number of potential caveats for this setting that may not generalize…

MMETR@METR_Evals · Jul 10

We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.

18.0K

Elizabeth Barnes@BethMayBarnes · Jun 4

I had a lot of fun chatting with Rob about METR's work. I stand by my claims here that the world is not on track to keep risk from AI to an acceptable level, and we desperately need more people working on these problems.

RRob Wiblin@robertwiblin · Jun 2

AI models currently have a 50% chance of doing something that takes a human expert one hour. This doubles every 7 months. In 2 years? They could automate full workdays. In 4 years? A full month. I discuss the most important graph in AI today with Beth Barnes, the CEO of METR,…

296

27.0K

Elizabeth Barnes@BethMayBarnes · Mar 20

Persnickety title would be: "there's an exponential trend with doubling time between ~2 -12 months on automatically-scoreable, relatively clean + green-field software tasks from a few distributions". More detail on how we thought about external validity in paper and this thread

MMegan Kinniment@MKinniment · Mar 19

Happy for this to be released! It’s the result of a lot of hard work from many of us at METR :) A big question is whether these results apply to ‘real’ tasks. Here’s some thoughts on that:

10.0K

Elizabeth Barnes@BethMayBarnes · Mar 19

Guys, AI is going to eat a shit ton of jobs. I don’t see anyone really talking about this meaningfully in terms of what to do about it for people. What’s the plan?

MMETR@METR_Evals · Mar 19

When will AI systems be able to carry out long projects independently? In new research, we find a kind of “Moore’s Law for AI agents”: the length of tasks that AIs can do is doubling about every 7 months.

1.0K

554

5.0K

1.0K

1.1M

Elizabeth Barnes@BethMayBarnes · Mar 19

Happy for this to be released! It’s the result of a lot of hard work from many of us at METR :) A big question is whether these results apply to ‘real’ tasks. Here’s some thoughts on that:

MMETR@METR_Evals · Mar 19

111

21.0K