Guillermo Barbadillo
@guille_bar
In a quest to understand intelligence Hablando de IA en español en la TERTULia: https://ironbar.github.io/tertulia_inteligencia_artificial/
Evolution of computing power over time github.com/ironbar/comput…

Nice paper that in my opinion goes in the right direction to solve ARC. It generates python code to tackle the ARC tasks and combines search and learning in a virtuous cycle. I have summarized the results in the following plot.
Introducing SOAR 🚀, a self-improving framework for prog synth that alternates between search and learning (accepted to #ICML!) It brings LLMs from just a few percent on ARC-AGI-1 up to 52% We’re releasing the finetuned LLMs, a dataset of 5M generated programs and the code. 🧵
As far as I understand, this is another case of test-time training, since they use example pairs from both the training and evaluation sets. I'm not sure if the hierarchical architecture is necessary, or we could get similar results with other models.
Impressive work by @makingAGI and team No pre-training or CoT with material performance on ARC-AGI > With only 27 million parameters, HRM achieves exceptional performance on complex reasoning tasks using only 1000 training samples
Big news: we've figured out how to make a *universal* reward function that lets you apply RL to any agent with: - no labeled data - no hand-crafted reward functions - no human feedback! A 🧵 on RULER
Hello Gemini 2.5 Flash-Lite! So fast, it codes *each screen* on the fly (Neural OS concept 👇). The frontier isn't always about large models and beating benchmarks. In this case, a super fast & good model can unlock drastic use cases. Read more: blog.google/products/gemin…
Anthropic co-founder Ben Mann says we'll know AI is transformative when it passes the "Economic Turing Test." Give an AI agent a job for a month. Let the hiring manager choose: human or machine? When they pick the machine more often than not, we've crossed the threshold.
Intento no antropomorfizar la IA, pero el otro día me hizo llorar
ARC-AGI-3 will be interactive and similar in spirit to Animal-AI Olympics by Matthew Crosby youtu.be/52ibXR6A1TE?si…
Interactive Reasoning Benchmarks are the next step in frontier evaluations Hear @GregKamradt share why measuring human-like intelligence requires multi-turn environments Including a sneak peak of ARC-AGI-3 Want to help us build interactive evaluations? We're hiring
Google quietly released an app that lets you download and run AI models locally ift.tt/aoXg0zm
This is Plastic. Made with Veo3. Spoilers in the next post. Watch before reading
LLMs, RL, and rockets! 🚀 Cool paper showing how test-time reinforcement learning can optimize engineering problems when a continuous reward signal is available.
🚀 New paper: LLMs for Engineering: Teaching Models to Design High-Powered Rockets 🚀 We built an environment to allow models to build high powered rockets and show by using RL models can surpass human designs!
The released version of o3 scores just 3% on ARC-AGI-2. Adaptation to novelty is still an unsolved problem in AI (and intelligence is all about adaptation to novelty)
o3 and o4-mini on ARC-AGI's Semi Private Evaluation * o3-medium scores 53% on ARC-AGI-1 * o4-mini shows state-of-the-art efficiency * ARC-AGI-2 remains virtually unsolved (<3%) Through analysis we highlight differences from o3-preview and other model behavior
OpenAI's o3: Over-optimization is back and weirder than ever Tools, true rewards, and a new direction for language models. interconnects.ai/p/openais-o3-o…
Microsoft just released the first natively trained 1-bit model: BitNet 2B. Trained on 4 Trillion tokens. Native 1.58-bit weights and 8-bit activations (W1.58A8). Performs very close to Qwen 2.5 1.5B in benchmarks while being 1/6 of its size and twice faster.
Happy to be the first team to break the 10% barrier on ARC-AGI-2. I hope to make small improvements in the next days, but hitting 20%+ might take some black magic. 🧙♂️
When do you think we'll see the first >10% ARC Prize entry on Kaggle?
Gracias a gpt-4o ahora puedo dedicarme a mi verdadera vocación

What if Studio Ghibli directed Lord of the Rings? I spent $250 in Kling credits and 9 hours re-editing the Fellowship trailer to bring that vision to life—and I’ll show you exactly how I did it 👇🏼