Sian Gooding @ ACL2025NLP 🇦🇹

@SianGooding

Senior Research Scientist @GoogleDeepMind working on Autonomous Assistants

London

Joined July 2018

528Following

1KFollowers

Pinned

Sian Gooding @ ACL2025NLP 🇦🇹@SianGooding · Mar 26

New paper alert from @GoogleDeepMind! 🚨 We've put LLMs to the test as writing co-pilots – how good are they really at helping us write? LLMs are increasingly used for open-ended tasks like writing assistance, but how do we assess their effectiveness? 🤔 arxiv.org/abs/2503.19711

176

110

23.0K

Pinned

Sian Gooding @ ACL2025NLP 🇦🇹@SianGooding · Jul 16

I'm thrilled to be at @DeepIndaba in Rwanda 🇷🇼 Let's collaborate to ensure AI's benefits reach everyone, everywhere. I look forward to connecting with you all 😍

DDeep Learning Indaba@DeepIndaba · Jul 16

We’re excited to welcome @verena_rieser, Senior Staff Research Scientist at @GoogleDeepMind, as our first keynote speaker at #DLI2025 ! Don't miss her talk around AI Alignment on Monday, August 18, 10:30 am GMT+2 ! Join us virtually now ! 🌐 deeplearningindaba.com/2025/virtual-i…

801

Sian Gooding @ ACL2025NLP 🇦🇹 Retweeted

Edward Grefenstette@egrefen · Jul 21

Job advert is here: job-boards.greenhouse.io/deepmind/jobs/… Deadline: EOD Friday 1st August. Apply ASAP as we will look at candidates as they come in. Please DO NOT apply if you're looking for internships, or are graduating in 2026 or beyond. Wait for appropriate postings.

5.0K

Sian Gooding @ ACL2025NLP 🇦🇹 Retweeted

Edward Grefenstette@egrefen · Jul 21

Do you have a PhD (or equivalent) or will have one in the coming months (i.e. 2-3 months away from graduating)? Do you want to help build open-ended agents that help humans do humans things better, rather than replace them? We're hiring 1-2 Research Scientists! Check the 🧵👇

339

216

53.0K

Sian Gooding @ ACL2025NLP 🇦🇹 Retweeted

Yevgeni Berzak@whylikethis_ · Jul 19

Eye Tracking + NLP = 😍 Attending ACL 2025? Looking for a new multimodal modeling challenge? Interested in cognitive modeling and NLP for science? Excited about human centered applications of NLP in areas such as education and content accessibility?

1.0K

Sian Gooding @ ACL2025NLP 🇦🇹 Retweeted

Deedy@deedydas · Jul 16

Google DeepMind just dropped this new LLM model architecture called Mixture-of-Recursions. It gets 2x inference speed, reduced training FLOPs and ~50% reduced KV cache memory. Really interesting read. Has potential to be a Transformers killer.

455

3.0K

2.0K

242.0K

Sian Gooding @ ACL2025NLP 🇦🇹@SianGooding · Jul 16

Interesting post. However, it seems to be in conflict with the most central problem in theoretical computer science: P vs NP ,which is exactly the question: is it fundamentally easier to verify a solution rather than solve a problem. Most people believe that verification is…

JJason Wei@_jasonwei · Jul 16

New blog post about asymmetry of verification and "verifier's law": jasonwei.net/blog/asymmetry… Asymmetry of verification–the idea that some tasks are much easier to verify than to solve–is becoming an important idea as we have RL that finally works generally. Great examples of…

219

124

30.0K

Sian Gooding @ ACL2025NLP 🇦🇹 Retweeted

Aditi Mavalankar@aditimavalankar · Mar 17

Excited to share our recent work, AuPair, an inference-time technique that builds on the premise of in-context learning to improve LLM coding performance! arxiv.org/abs/2502.18487

16.0K

Sian Gooding @ ACL2025NLP 🇦🇹@SianGooding · Jul 13

On my way to #ICML2025 to present our algorithm that strongly scales with inference compute, in both performance and sample diversity! 🚀 Reach out if you’d like to chat more!

AAditi Mavalankar@aditimavalankar · Mar 17

Excited to share our recent work, AuPair, an inference-time technique that builds on the premise of in-context learning to improve LLM coding performance! arxiv.org/abs/2502.18487

9.0K

Sian Gooding @ ACL2025NLP 🇦🇹 Retweeted

Valentina Pyatkin@valentina__py · Jul 3

💡Beyond math/code, instruction following with verifiable constraints is suitable to be learned with RLVR. But the set of constraints and verifier functions is limited and most models overfit on IFEval. We introduce IFBench to measure model generalization to unseen constraints.

352

182

46.0K

Sian Gooding @ ACL2025NLP 🇦🇹@SianGooding · Jul 1

In our latest paper, we discovered a surprising result: training LLMs with self-play reinforcement learning on zero-sum games (like poker) significantly improves performance on math and reasoning benchmarks, zero-shot. Whaaat? How does this work? We analyze the results and find…

BBo Liu (Benjamin Liu)@Benjamin_eecs · Jul 1

We've always been excited about self-play unlocking continuously improving agents. Our insight: RL selects generalizable CoT patterns from pretrained LLMs. Games provide perfect testing grounds with cheap, verifiable rewards. Self-play automatically discovers and reinforces…

274

149

25.0K

Sian Gooding @ ACL2025NLP 🇦🇹@SianGooding · Jul 2

Highly recommend reading this, or at least the intro and conclusion. Some gems about the future of safety research

aakbir.@akbirkhan · Jul 2

here is my thesis “Safe Automated Research” i worked on 3 approaches to make sure we can trust the output of automated researchers as we reach this new era of science it was a very fun PhD

2.0K

Sian Gooding @ ACL2025NLP 🇦🇹 Retweeted

Minqi Jiang@MinqiJiang · Jun 30

Recently, there has been a lot of talk of LLM agents automating ML research itself. If Llama 5 can create Llama 6, then surely the singularity is just around the corner. How can we get a pulse check on whether current LLMs are capable of driving this kind of total…

194

1.0K

790

530.0K

Sian Gooding @ ACL2025NLP 🇦🇹 Retweeted

akbir.@akbirkhan · Jul 2

here is my thesis “Safe Automated Research” i worked on 3 approaches to make sure we can trust the output of automated researchers as we reach this new era of science it was a very fun PhD

192

105

14.0K

Sian Gooding @ ACL2025NLP 🇦🇹@SianGooding · Jul 1

Check out our take on Chain-of-Thought. I really like this paper as a survey on the current literature on what CoT is, but more importantly on what it's not. It also serves as a cautionary tale to the (apparently quite common) misuse of CoT as an interpretable method.

FFazl Barez @ICML2025@FazlBarez · Jul 1

Excited to share our paper: "Chain-of-Thought Is Not Explainability"! We unpack a critical misconception in AI: models explaining their Chain-of-Thought (CoT) steps aren't necessarily revealing their true reasoning. Spoiler: transparency of CoT can be an illusion. (1/9) 🧵

6.0K

Sian Gooding @ ACL2025NLP 🇦🇹 Retweeted

Martin Klissarov@MartinKlissarov · Jun 27

As AI agents face increasingly long and complex tasks, decomposing them into subtasks becomes increasingly appealing. But how do we discover such temporal structure? Hierarchical RL provides a natural formalism-yet many questions remain open. Here's our overview of the field🧵

277

181

31.0K

Sian Gooding @ ACL2025NLP 🇦🇹@SianGooding · Jun 27

I'm looking forward to giving a keynote at #ACL2025NLP! See you in Vienna 🇦🇹

AACL 2025@aclmeeting · Jun 27

📣 And another one! 🌟 We're delighted to announce Verena Rieser from Google DeepMind "Whose Gold? Re-imagining Alignment for Truly Beneficial AI." 🤖 A discussion on technical and ethical challenges of building beneficial AI #ACL2025NLP #NLProc 2025.aclweb.org/program/keynot…

3.0K

Sian Gooding @ ACL2025NLP 🇦🇹 Retweeted

Laura Ruis@LauraRuis · Jun 24

LLMs can be programmed by backprop 🔎 In our new preprint, we show they can act as fuzzy program interpreters and databases. After being ‘programmed’ with next-token prediction, they can retrieve, evaluate, and even *compose* programs at test time, without seeing I/O examples.

316

252

33.0K

Sian Gooding @ ACL2025NLP 🇦🇹 Retweeted

Jani Eväkallio@jevakallio · Jun 3

📢 Announcement! We're building a new type of word processor at @writewithmarker, and we're hiring for ProseMirror hackers and full-stack AI engineers to join the team in London Are you an engineer who cares about writing? Or do you know someone who does? Links below 👇

4.0K