Mike Knoop
@mikeknoop
co-founder @ndea and @zapier @arcprize
Today we’re releasing our first public preview of ARC-AGI-3: the first three games. Version 3 is a big upgrade over v1 and v2 which are designed to challenge pure deep learning and static reasoning. In contrast, v3 challenges interactive reasoning (eg. agents). The full version…
I was in contact with the Qwen team trying to reproduce their 41% results on ARC-AGI-1 but ultimately couldn't They open sourced their method and code if anyone wants to check it out and confirm We tested their model exactly the same as we test all other models (o3-high, grok…
Qwen3-235b-a22b Instruct-2507 ARC-AGI Semi Private Eval * ARC-AGI-1: 11%, $0.003/task * ARC-AGI-2: 1.3%, $0.004/task
"Cloud Science" is going to be a major new jobs sector as we get closer to AGI.
New paper from an ARC Prize 2024 top paper author
My new paper proposes an implementation of execution-guided neural program synthesis for ARC-AGI (@arcprize), and compares its compositional generalization capabilities with a few alternatives such as test-time fine-tuning. The conclusion is that execution-guided neural program…
2D grid puzzles are somehow AI kryptonite
I’m no mathematician but curious why 1) No lab solved this 2) Is this much harder than P1-5 3) What specifically about this problem is difficult
Given the added complexity of dealing with interactive environments, we tried to make getting start with ARC v3 research as simple as possible.
> git clone https://github. com/arcprize/ARC-AGI-3-Agents.git && cd ARC-AGI-3-Agents && uv sync > cp .env-example .env > uv run main .py --agent=random --game=ls20 You just ran your first agent against ARC-AGI-3
A hallmark of human intelligence is the capacity for rapid adaptation, solving new problems quickly under novel and unfamiliar conditions. How can we build machines to do so? In our new preprint, we propose that any general intelligence system must have an adaptive world model,…
Still looking for a useful definition of "ASI" that isn't marketing (where AGI is defined as human-level skill acquisition efficiency). A few vectors: 1. Total skill 2. Reasoning Kolmogorov complexity 3. Data efficiency AI is already super at (1) but inferior at (2) and (3).
Based on public information, major AI labs are pushing two AI reasoning frontiers to improve "process models" that generate the reasoning chains (or reasoning programs): 1. More search 2. More domains More test-time search is being deployed via improved process models to cover…
Reminder our v3 preview launch today is to make contact with reality and learn about our game design choices. We have a lot of work to do this year. Full v3 will launch early 2026. v1 continues to be useful for measuring pareto frontier. And v2 remains entirely unsaturated.
ARC-AGI 3 is already here we haven't even completed half of ARC-AGI 2, and now there's ARC-3 and wasn't the test meant to tell us when we've reached AGI? now the models are getting close, they keep making new tests and shifting the goalposts Turing test passed, ARC-AGI 1…
Live now! Watch @johncoogan and @jordihays try to play ARC v3 x.com/tbpn/status/19…
Come watch me prove my humanity by playing this live on stream today. 1:45pm pacific.
Good thread on state of the art agents for ARC v3
Here’s are some of the experiments and observations I did as part of the initial testers on the locksmith game using within ARC-AGI-3 (my template is available in the repository) 🧵
Today, we're announcing a preview of ARC-AGI-3, the Interactive Reasoning Benchmark with the widest gap between easy for humans and hard for AI We’re releasing: * 3 games (environments) * $10K agent contest * AI agents API Starting scores - Frontier AI: 0%, Humans: 100%
there is a serious automation idea in here! if you have an eval that correctly classifies certain inputs as too far "out of distribution" for today's reasoning systems, you can automatically route them to humans
a fun way to get a top ARC score using the new ChatGPT Agent: "solve this ARC task any way possible ... don't forget fiverr exists"
a fun way to get a top ARC score using the new ChatGPT Agent: "solve this ARC task any way possible ... don't forget fiverr exists"
AGI is idea constrained and talent is distributed. The US must choose if it wants that innovation to happen here. Progress will happen regardless.
It is a major policy failure that the US cannot accommodate top AI conferences due to visa issues.