Sian Gooding @ ACL2025NLP 🇦🇹
@SianGooding
Senior Research Scientist @GoogleDeepMind working on Autonomous Assistants
New paper alert from @GoogleDeepMind! 🚨 We've put LLMs to the test as writing co-pilots – how good are they really at helping us write? LLMs are increasingly used for open-ended tasks like writing assistance, but how do we assess their effectiveness? 🤔 arxiv.org/abs/2503.19711
I'm thrilled to be at @DeepIndaba in Rwanda 🇷🇼 Let's collaborate to ensure AI's benefits reach everyone, everywhere. I look forward to connecting with you all 😍
We’re excited to welcome @verena_rieser, Senior Staff Research Scientist at @GoogleDeepMind, as our first keynote speaker at #DLI2025 ! Don't miss her talk around AI Alignment on Monday, August 18, 10:30 am GMT+2 ! Join us virtually now ! 🌐 deeplearningindaba.com/2025/virtual-i…
Job advert is here: job-boards.greenhouse.io/deepmind/jobs/… Deadline: EOD Friday 1st August. Apply ASAP as we will look at candidates as they come in. Please DO NOT apply if you're looking for internships, or are graduating in 2026 or beyond. Wait for appropriate postings.
Do you have a PhD (or equivalent) or will have one in the coming months (i.e. 2-3 months away from graduating)? Do you want to help build open-ended agents that help humans do humans things better, rather than replace them? We're hiring 1-2 Research Scientists! Check the 🧵👇
Eye Tracking + NLP = 😍 Attending ACL 2025? Looking for a new multimodal modeling challenge? Interested in cognitive modeling and NLP for science? Excited about human centered applications of NLP in areas such as education and content accessibility?
Google DeepMind just dropped this new LLM model architecture called Mixture-of-Recursions. It gets 2x inference speed, reduced training FLOPs and ~50% reduced KV cache memory. Really interesting read. Has potential to be a Transformers killer.
Interesting post. However, it seems to be in conflict with the most central problem in theoretical computer science: P vs NP ,which is exactly the question: is it fundamentally easier to verify a solution rather than solve a problem. Most people believe that verification is…
New blog post about asymmetry of verification and "verifier's law": jasonwei.net/blog/asymmetry… Asymmetry of verification–the idea that some tasks are much easier to verify than to solve–is becoming an important idea as we have RL that finally works generally. Great examples of…
Excited to share our recent work, AuPair, an inference-time technique that builds on the premise of in-context learning to improve LLM coding performance! arxiv.org/abs/2502.18487
On my way to #ICML2025 to present our algorithm that strongly scales with inference compute, in both performance and sample diversity! 🚀 Reach out if you’d like to chat more!
Excited to share our recent work, AuPair, an inference-time technique that builds on the premise of in-context learning to improve LLM coding performance! arxiv.org/abs/2502.18487
💡Beyond math/code, instruction following with verifiable constraints is suitable to be learned with RLVR. But the set of constraints and verifier functions is limited and most models overfit on IFEval. We introduce IFBench to measure model generalization to unseen constraints.
In our latest paper, we discovered a surprising result: training LLMs with self-play reinforcement learning on zero-sum games (like poker) significantly improves performance on math and reasoning benchmarks, zero-shot. Whaaat? How does this work? We analyze the results and find…
We've always been excited about self-play unlocking continuously improving agents. Our insight: RL selects generalizable CoT patterns from pretrained LLMs. Games provide perfect testing grounds with cheap, verifiable rewards. Self-play automatically discovers and reinforces…
Highly recommend reading this, or at least the intro and conclusion. Some gems about the future of safety research
here is my thesis “Safe Automated Research” i worked on 3 approaches to make sure we can trust the output of automated researchers as we reach this new era of science it was a very fun PhD
Recently, there has been a lot of talk of LLM agents automating ML research itself. If Llama 5 can create Llama 6, then surely the singularity is just around the corner. How can we get a pulse check on whether current LLMs are capable of driving this kind of total…
here is my thesis “Safe Automated Research” i worked on 3 approaches to make sure we can trust the output of automated researchers as we reach this new era of science it was a very fun PhD
Check out our take on Chain-of-Thought. I really like this paper as a survey on the current literature on what CoT is, but more importantly on what it's not. It also serves as a cautionary tale to the (apparently quite common) misuse of CoT as an interpretable method.
Excited to share our paper: "Chain-of-Thought Is Not Explainability"! We unpack a critical misconception in AI: models explaining their Chain-of-Thought (CoT) steps aren't necessarily revealing their true reasoning. Spoiler: transparency of CoT can be an illusion. (1/9) 🧵
As AI agents face increasingly long and complex tasks, decomposing them into subtasks becomes increasingly appealing. But how do we discover such temporal structure? Hierarchical RL provides a natural formalism-yet many questions remain open. Here's our overview of the field🧵
I'm looking forward to giving a keynote at #ACL2025NLP! See you in Vienna 🇦🇹
📣 And another one! 🌟 We're delighted to announce Verena Rieser from Google DeepMind "Whose Gold? Re-imagining Alignment for Truly Beneficial AI." 🤖 A discussion on technical and ethical challenges of building beneficial AI #ACL2025NLP #NLProc 2025.aclweb.org/program/keynot…
LLMs can be programmed by backprop 🔎 In our new preprint, we show they can act as fuzzy program interpreters and databases. After being ‘programmed’ with next-token prediction, they can retrieve, evaluate, and even *compose* programs at test time, without seeing I/O examples.
📢 Announcement! We're building a new type of word processor at @writewithmarker, and we're hiring for ProseMirror hackers and full-stack AI engineers to join the team in London Are you an engineer who cares about writing? Or do you know someone who does? Links below 👇