Guillermo Barbadillo

@guille_bar

In a quest to understand intelligence Hablando de IA en español en la TERTULia: https://ironbar.github.io/tertulia_inteligencia_artificial/

Pamplona, Spain

Joined February 2018

245Following

1KFollowers

Pinned

Guillermo Barbadillo@guille_bar · Apr 16, 2023

Evolution of computing power over time github.com/ironbar/comput…

6.0K

Pinned

Guillermo Barbadillo@guille_bar · Jul 14

Nice paper that in my opinion goes in the right direction to solve ARC. It generates python code to tackle the ARC tasks and combines search and learning in a virtuous cycle. I have summarized the results in the following plot.

PPourcel Julien @ICML@PourcelJulien · Jul 10

Introducing SOAR 🚀, a self-improving framework for prog synth that alternates between search and learning (accepted to #ICML!) It brings LLMs from just a few percent on ARC-AGI-1 up to 52% We’re releasing the finetuned LLMs, a dataset of 5M generated programs and the code. 🧵

6.0K

Guillermo Barbadillo@guille_bar · Jul 23

As far as I understand, this is another case of test-time training, since they use example pairs from both the training and evaluation sets. I'm not sure if the hierarchical architecture is necessary, or we could get similar results with other models.

AARC Prize@arcprize · Jul 21

Impressive work by @makingAGI and team No pre-training or CoT with material performance on ARC-AGI > With only 27 million parameters, HRM achieves exceptional performance on complex reasoning tasks using only 1000 training samples

508

Guillermo Barbadillo Retweeted

Kyle Corbitt@corbtt · Jul 11

Big news: we've figured out how to make a *universal* reward function that lets you apply RL to any agent with: - no labeled data - no hand-crafted reward functions - no human feedback! A 🧵 on RULER

124

1.0K

2.0K

172.0K

Guillermo Barbadillo Retweeted

Oriol Vinyals@OriolVinyalsML · Jun 17

Hello Gemini 2.5 Flash-Lite! So fast, it codes *each screen* on the fly (Neural OS concept 👇). The frontier isn't always about large models and beating benchmarks. In this case, a super fast & good model can unlock drastic use cases. Read more: blog.google/products/gemin…

276

2.0K

764

410.0K

Guillermo Barbadillo Retweeted

vitrupo@vitrupo · Jun 12

Anthropic co-founder Ben Mann says we'll know AI is transformative when it passes the "Economic Turing Test." Give an AI agent a job for a month. Let the hiring manager choose: human or machine? When they pick the machine more often than not, we've crossed the threshold.

774

292

66.0K

Guillermo Barbadillo Retweeted

Matías S. Zavia@matiass · Jun 11

Intento no antropomorfizar la IA, pero el otro día me hizo llorar

3.0K

Guillermo Barbadillo@guille_bar · Jun 10

ARC-AGI-3 will be interactive and similar in spirit to Animal-AI Olympics by Matthew Crosby youtu.be/52ibXR6A1TE?si…

AARC Prize@arcprize · Jun 9

Interactive Reasoning Benchmarks are the next step in frontier evaluations Hear @GregKamradt share why measuring human-like intelligence requires multi-turn environments Including a sneak peak of ARC-AGI-3 Want to help us build interactive evaluations? We're hiring

1.0K

Guillermo Barbadillo Retweeted

Kyle Wiggers@Kyle_L_Wiggers · May 31

Google quietly released an app that lets you download and run AI models locally ift.tt/aoXg0zm

2.0K

Guillermo Barbadillo Retweeted

MetaPuppet@MetaPuppet · May 25

This is Plastic. Made with Veo3. Spoilers in the next post. Watch before reading

224

533

4.0K

2.0K

881.0K

Guillermo Barbadillo@guille_bar · May 18

LLMs, RL, and rockets! 🚀 Cool paper showing how test-time reinforcement learning can optimize engineering problems when a continuous reward signal is available.

TToby Simonds@tobyrsimonds · Apr 29

🚀 New paper: LLMs for Engineering: Teaching Models to Design High-Powered Rockets 🚀 We built an environment to allow models to build high powered rockets and show by using RL models can surpass human designs!

641

Guillermo Barbadillo@guille_bar · Apr 22

The released version of o3 scores just 3% on ARC-AGI-2. Adaptation to novelty is still an unsolved problem in AI (and intelligence is all about adaptation to novelty)

AARC Prize@arcprize · Apr 22

o3 and o4-mini on ARC-AGI's Semi Private Evaluation * o3-medium scores 53% on ARC-AGI-1 * o4-mini shows state-of-the-art efficiency * ARC-AGI-2 remains virtually unsolved (<3%) Through analysis we highlight differences from o3-preview and other model behavior

183

16.0K

Guillermo Barbadillo Retweeted

Interconnects@interconnectsai · Apr 19

OpenAI's o3: Over-optimization is back and weirder than ever Tools, true rewards, and a new direction for language models. interconnects.ai/p/openais-o3-o…

38.0K

Guillermo Barbadillo Retweeted

Ahmad@TheAhmadOsman · Apr 15

Microsoft just released the first natively trained 1-bit model: BitNet 2B. Trained on 4 Trillion tokens. Native 1.58-bit weights and 8-bit activations (W1.58A8). Performs very close to Qwen 2.5 1.5B in benchmarks while being 1/6 of its size and twice faster.

154

1.0K

624

106.0K

Guillermo Barbadillo@guille_bar · Apr 10

Happy to be the first team to break the 10% barrier on ARC-AGI-2. I hope to make small improvements in the next days, but hitting 20%+ might take some black magic. 🧙‍♂️

FFrançois Chollet@fchollet · Mar 28

When do you think we'll see the first >10% ARC Prize entry on Kaggle?

474

70.0K

Guillermo Barbadillo@guille_bar · Apr 1

Gracias a gpt-4o ahora puedo dedicarme a mi verdadera vocación

1.0K

Guillermo Barbadillo Retweeted

PJ Ace@PJaccetturo · Mar 27

What if Studio Ghibli directed Lord of the Rings? I spent $250 in Kling credits and 9 hours re-editing the Fellowship trailer to bring that vision to life—and I’ll show you exactly how I did it 👇🏼

4.0K

13.0K

98.0K

37.0K

12.9M