Alexandra Barr
@BarrAlexandra
human data @OpenAI
After a year at openai, I’m soft launching my new ai newsletter! The OGs will know superfast but this one is now more representative (superslow) superslow-ai.com
I am so interested in every METR paper that comes out *immediately added to shopping cart* 🛒
METR previously estimated that the time horizon of AI agents on software tasks is doubling every 7 months. We have now analyzed 9 other benchmarks for scientific reasoning, math, robotics, computer use, and self-driving; we observe generally similar rates of improvement.
The great thing about doing taxes this time of year is that accountants are not underwater right now. Really it’s thoughtful actually Uh no, I haven’t done my taxes yet
for the builders: if anyone wants a referral to oai dev day lmk
1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
agent is here! so so proud of the team!! 🫶
ChatGPT can now do work for you using its own computer. Introducing ChatGPT agent—a unified agentic system combining Operator’s action-taking remote browser, deep research’s web synthesis, and ChatGPT’s conversational strengths.
Oops, it’s been a month since my last post New post on optimizing test-time vs training-time allocations on constrained compute budgets. Landing tmrw Sneak peek:
Noodling over some thoughts around compute-optimal pretraining vs. inference-time usage: Is it better to use your compute budget on model training or on "thinking deeply" when you ask it question?
Crazyy
We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
On human datasets, pick two: 1. Volume target 2. Timeline target 3. Quality target This feels real