Mikel Bober-Irizar

@mikb0b

23 // Kaggle Competitions Grandmaster & ML/AI Researcher. Building video games @iconicgamesio, machine reasoning @Cambridge_CL, bioscience @ForecomAI.

London

Joined August 2011

1KFollowing

8KFollowers

Pinned

Mikel Bober-Irizar@mikb0b · Dec 24

Why do pre-o3 LLMs struggle with generalization tasks like @arcprize? It's not what you might think. OpenAI o3 shattered the ARC-AGI benchmark. But the hardest puzzles didn’t stump it because of reasoning, and this has implications for the benchmark as a whole. Analysis below🧵

mikb0b's tweet image. Why do pre-o3 LLMs struggle with generalization tasks like @arcprize? It's not what you might think.

OpenAI o3 shattered the ARC-AGI benchmark. But the hardest puzzles didn’t stump it because of reasoning, and this has implications for the benchmark as a whole.

Analysis below🧵

668

431

205.0K

Mikel Bober-Irizar@mikb0b · Mar 16

Really good to be back in SF for GDC (yes, our game is still cooking 👀) If you're around and want to meet up next week, let me know!

805

Mikel Bober-Irizar@mikb0b · Dec 25

Seeing this chart go around a bunch, I think the main point is being missed - “LLMs can’t solve large grids because of perception” This is a deficiency in the model, there are alternative ways to “perceive” the grid. Doing it in 1-shot is not required. As a human, do you hold…

MMikel Bober-Irizar@mikb0b · Dec 24

LLMs are dramatically worse at ARC tasks the bigger they get. However, humans have no such issues - ARC task difficulty is independent of size. Most ARC tasks contain around 512-2048 pixels, and o3 is the first model capable of operating on these text grids reliably.

16.0K

Mikel Bober-Irizar@mikb0b · Dec 25

I recommend reading @mikb0b 's article on o3's performance on the ARC challenge. He proves that LLMs' struggle with ARC depend on their inability to easily process large 2d grids.

MMikel Bober-Irizar@mikb0b · Dec 25

This is a really good observation! I wrote about it and analyzed why in this article: anokas.substack.com/p/llms-struggl…

2.0K

Mikel Bober-Irizar@mikb0b · Dec 24

more evidence (including experiments varying sizes of problems) that grid size alone plays a significant role in arc. this is obviously far from ideal for a reasoning benchmark and can hopefully get addressed in arc-agi-2

MMikel Bober-Irizar@mikb0b · Dec 24

2.0K

Mikel Bober-Irizar Retweeted

meg.ai @ ICML2025 🇨🇦@MeganRisdal · Dec 14

Really great to meet and catch up with @mikb0b in person after many years! 😄

2.0K

Mikel Bober-Irizar@mikb0b · Feb 27, 2024

I'm heading back to San Francisco for @Official_GDC 🎮 - if anyone's around the bay area late March and wants to meet up let me know!

2.0K

Mikel Bober-Irizar@mikb0b · Nov 4, 2023

I'll be speaking at @NVIDIA's AI & DS Virtual Summit about the journey to becoming the youngest Kaggle Grandmaster, along with @Rob_Mulla and @kagglingdieter. 🔥 Come and join us for a live Q&A on Wednesday 9th at 12pm PT (for free!) nvidia.com/en-us/events/a… @NVIDIAAI

mikb0b's tweet image. I'll be speaking at @NVIDIA's AI &amp; DS Virtual Summit about the journey to becoming the youngest Kaggle Grandmaster, along with @Rob_Mulla and @kagglingdieter. 🔥

Come and join us for a live Q&amp;A on Wednesday 9th at 12pm PT (for free!) nvidia.com/en-us/events/a… @NVIDIAAI

18.0K

Mikel Bober-Irizar@mikb0b · Oct 20, 2023

I'm going to be in San Francisco in early November! ✈️ If anyone's in the bay area and wants to meet up, or if anyone knows any events I should check out, let me know! 😊

2.0K

Mikel Bober-Irizar@mikb0b · Oct 14, 2023

I've recently been playing with @fchollet's Abstraction and Reasoning Corpus, a really interesting benchmark for building systems that can reason. As part of that, I've just released a small 🐍 library for easily interacting with and visualising ARC: github.com/mxbi/arckit

mikb0b's tweet image. I've recently been playing with @fchollet's Abstraction and Reasoning Corpus, a really interesting benchmark for building systems that can reason.

As part of that, I've just released a small 🐍 library for easily interacting with and visualising ARC: github.com/mxbi/arckit

219

122

48.0K

Mikel Bober-Irizar@mikb0b · May 5, 2023

Really proud to be published in a Nature Portfolio journal for the first time! We set a new SOTA for single-cell protein localisation on the @ProteinAtlas, building on our work in the 2nd HPA Kaggle comp. nature.com/articles/s4200… @ForecomAI @cvssp_research @d_minskiy

mikb0b's tweet card. Communications Biology - The Hybrid subCellular Protein Localiser (HCPL) improves single-cell classification for protein localization, allowing for large-scale data annotation through deep-learning...

7.0K