Greg Kamradt
@GregKamradt
President @arcprize, Founder https://www.leverage.to, builder/engineer
Intelligence is interactive Life does not happen in a single turn, but yet, frontier AI is measured with static benchmarks Today we're previewing a preview of ARC-AGI-3 an Interactive Reasoning Benchmark You can play (and build agents) on it today
Today, we're announcing a preview of ARC-AGI-3, the Interactive Reasoning Benchmark with the widest gap between easy for humans and hard for AI We’re releasing: * 3 games (environments) * $10K agent contest * AI agents API Starting scores - Frontier AI: 0%, Humans: 100%
Anyone have a connection at @Alibaba_Qwen? Trying to reproduce the results on @arcprize and getting different metrics Want to get a hold of them and find out how they tested
.@arcprize listed on the @Alibaba_Qwen model card 2nd model card for us in 2 weeks Excited for ARC-AGI to be seen as a supported way to measure model performance x.com/Alibaba_Qwen/s…
"utm_source=openai" Has a URL parameter ever added more market value to a company than this?
AGI is a threshold of capability There will be as many variations as there sorting algorithms
So strange how most people refer to "AGI" generically as one monolith, like they'll all be the same. There isn't going to be one AGI, or one type of AGI. There's going to be an infinite variety of flavors.
.@arcprize listed on the @Alibaba_Qwen model card 2nd model card for us in 2 weeks Excited for ARC-AGI to be seen as a supported way to measure model performance x.com/Alibaba_Qwen/s…
Performance
We've had 3 leaders in the past 7 days for @arcprize Top score ($50K pool) prize heating up
New ARC Prize 2025 High Score 17.6% by Giotto. ai (@podesta_aldo)
This thread has a great intro on build agents for ARG-AGI-3 Competition open for 27 more days
TL;DR: - We build an agent consistently capable of passing the level 1 and partially completing the level 2. - To do so, we had to assist the agent with some pre-computed values provided into its context (i.e. the door, key color, etc, the rules of the game, etc.) - The agent was…
Before the preview, we worked with the team to beta test ARC-AGI-3 using our own agents. The games are devilishly hard, and it took a lot of tricks to get them to work. Here are our key learnings (🧵):
Today, we're announcing a preview of ARC-AGI-3, the Interactive Reasoning Benchmark with the widest gap between easy for humans and hard for AI We’re releasing: * 3 games (environments) * $10K agent contest * AI agents API Starting scores - Frontier AI: 0%, Humans: 100%
My bar for robotics agi (do anything a human can) is get under my house and fix a pipe in the crawl space then come up and make me sign an invoice
my bar for agi is an ai that can learn to run a gas station for a year without a team of scientists collecting the Gas Station Dataset