Alex Goldie
@AlexDGoldie
RS intern @wayve_ai 🚗 PhD student at 🤖 @whi_rl and @flair_ox 🤖 First Class MEng from Oxford 🎓
1/ 🕵️ Algorithm discovery could lead to huge AI breakthroughs! But what is the best way to learn or discover new algorithms? I'm so excited to share our brand new @rl_conference paper which takes a step towards answering this! 🧵

I’m building a new team at @GoogleDeepMind to work on Open-Ended Discovery! We’re looking for strong Research Scientists and Research Engineers to help us push the frontier of autonomously discovering novel artifacts such as new knowledge, capabilities, or algorithms, in an…
Very proud of this work! If you're interested in AI agents and their current challenges, give this a read. Thanks to my incredible collaborators and to @Meta and @ucl for enabling me to tackle something of this scale for my first PhD paper. Excited for what's ahead!
Scaling AI research agents is key to tackling some of the toughest challenges in the field. But what's required to scale effectively? It turns out that simply throwing more compute at the problem isn't enough. We break down an agent into four fundamental components that shape…
Interested in Long-Range Interaractions ? Come speak with us now (4:30pm-7pm) at our poster E-2802 @ #ICML2025 @benpgutteridge @jacobbamberger @mmbronstein @epomqo
I think ARC is a great eval, but at this point we should just use nethack
Today we’re releasing our first public preview of ARC-AGI-3: the first three games. Version 3 is a big upgrade over v1 and v2 which are designed to challenge pure deep learning and static reasoning. In contrast, v3 challenges interactive reasoning (eg. agents). The full version…
Unlock real diversity in your LLM! 🚀 LLM outputs can be boring and repetitive. Today, we release Intent Factored Generation (IFG) to: - Sample conceptually diverse outputs💡 - Improve performance on math and code reasoning tasks🤔 - Get more engaging conversational agents 🤖
Unlock the Hidden Diversity in Your Language Model. In our new paper, Intent Factored Generation (IFG), we propose an inference time method to increase the diversity of generations from LLMs. IFG leads to improvements in searching for solutions to maths and code problems. (1/6)
Unlock real diversity in your LLM! 🚀 LLM outputs can be boring and repetitive. Today, we release Intent Factored Generation (IFG) to: - Sample conceptually diverse outputs💡 - Improve performance on math and code reasoning tasks🤔 - Get more engaging conversational agents 🤖
Gradual Disempowerment puts a name to one of the greatest and least controversial AI risks. How to maintain our current balance of power and human autonomy is one of the most pressing questions of our generation.
We're presenting ICML Position "Humanity Faces Existential Risk from Gradual Disempowerment" : come talk to us today East Exhibition Hall E-503. @DavidDuvenaud @raymondadouglas @AmmannNora @DavidSKrueger Also: meet Mary, protagonist of our poster.
Antiviral therapy design is myopic 🦠🙈 optimised only for the current strain. That's why you need a different Flu vaccine every year! Our #ICML2025 paper ADIOS proposes "shaper therapies" that steer viral evolution in our favour & remain effective. Work done @FLAIR_Ox 🧵👇
There is a lot of research alpha in looking at why things don’t work (when it’s plausible they should). Much better than looking at things that we know work and trying to figure out why.
Hyperparameters are the worst, but Hyperoptax makes dealing with them a little bit less bad
🚀 Excited to announce Hyperoptax, a library for parallel hyperparameter tuning in JAX. Implements Grid, Random, and Bayesian search in pure JAX so that you can rapidly search across parameter configurations in parallel ‖. 📦 pip install hyperoptax github.com/TheodoreWolf/h…
Theory of Mind (ToM) is crucial for next gen LLM Agents, yet current benchmarks suffer from multiple shortcomings. Enter 💽 Decrypto, an interactive benchmark for multi-agent reasoning and ToM in LLMs! Work done with @TimonWilli & @j_foerst at @AIatMeta & @FLAIR_Ox 🧵👇
🧵 Check out our latest preprint: "Programming by Backprop". What if LLMs could internalize algorithms just by reading code, with no input-output examples? This could reshape how we train models to reason algorithmically. Let's dive into our findings 👇