Elizabeth Barnes
@BethMayBarnes
This study surprised me! The conclusion is opposite to what I would expect. It is tempting to try to find a reason it's bogus but I think it's well executed and solid work. As the authors say, there are a number of potential caveats for this setting that may not generalize…
We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
I had a lot of fun chatting with Rob about METR's work. I stand by my claims here that the world is not on track to keep risk from AI to an acceptable level, and we desperately need more people working on these problems.
AI models currently have a 50% chance of doing something that takes a human expert one hour. This doubles every 7 months. In 2 years? They could automate full workdays. In 4 years? A full month. I discuss the most important graph in AI today with Beth Barnes, the CEO of METR,…
Persnickety title would be: "there's an exponential trend with doubling time between ~2 -12 months on automatically-scoreable, relatively clean + green-field software tasks from a few distributions". More detail on how we thought about external validity in paper and this thread
Happy for this to be released! It’s the result of a lot of hard work from many of us at METR :) A big question is whether these results apply to ‘real’ tasks. Here’s some thoughts on that:
Guys, AI is going to eat a shit ton of jobs. I don’t see anyone really talking about this meaningfully in terms of what to do about it for people. What’s the plan?
When will AI systems be able to carry out long projects independently? In new research, we find a kind of “Moore’s Law for AI agents”: the length of tasks that AIs can do is doubling about every 7 months.
Happy for this to be released! It’s the result of a lot of hard work from many of us at METR :) A big question is whether these results apply to ‘real’ tasks. Here’s some thoughts on that:
When will AI systems be able to carry out long projects independently? In new research, we find a kind of “Moore’s Law for AI agents”: the length of tasks that AIs can do is doubling about every 7 months.