Raphaël Millière
@raphaelmilliere
Philosopher of Artificial Intelligence & Cog Science @Macquarie_Uni Past @Columbia @UniofOxford Also on other platforms Blog: http://artificialcognition.net
Transformer-based neural networks achieve impressive performance on coding, math & reasoning tasks that require keeping track of variables and their values. But how can they do that without explicit memory? 📄 Our new ICML paper investigates this in a synthetic setting! 🧵 1/13
Today we’re launching AutumnBench, our benchmark built on @BasisOrg’s Autumn platform. It’s designed to measure world modeling and reasoning by placing humans and AI in unfamiliar worlds—with no rewards or guidance—to test who can figure out how these worlds actually work.
We’re proud to announce the launch of AutumnBench, an open-source benchmark developed on our Autumn platform. This benchmark, led by our MARA team, provides a novel platform for evaluating world modeling and causal reasoning in both human and artificial intelligence.
Whale vocalizations not only resemble human vowels, but also behave like ones! We previously discovered that sperm whales have analogues to human vowels. In a new preprint, we analyze linguistic behavior of whale vowels.
Great question! The final checkpoint only makes 16 errors out of 50k test programs. Of those, 5 are consistent with the effect of early-line heuristics: 2 errors where it predicts the constant appearing on line 1, and 3 errors where it predicts the constant appearing on line 2.