Anian Ruoss
@anianruoss
Quantitative Developer at Quadrature Previously: Google DeepMind (Gemini Diffusion) | ETH Zurich
We're looking for people to join us to work on Gemini Diffusion and help revolutionize language modeling! Details below: job-boards.greenhouse.io/deepmind/jobs/…
Excited to share what my team has been working on lately - Gemini diffusion! We bring diffusion to language modeling, yielding more power and blazing speeds! 🚀🚀🚀 Gemini diffusion is especially strong at coding. In this example the model generates at 2000 tokens/sec,…
Thrilled to share a major step forward for AI for mathematical proof generation! We are releasing the Open Proof Corpus: the largest ever public collection of human-annotated LLM-generated math proofs, and a large-scale study over this dataset!
Very nice related paper that somehow flew under my radar. VLM/LLM playing simple games (see pic) without scaffold. But potentially with in-context demo or parsed (non-RGB) observation. Nothing works, ICL doesn't help, though o1 nails oxo and crosswords, and everyone can pathfind.
If you want to see how VLMs without scaffolding compare to a random baseline on gameplay, check out LMAct: arxiv.org/abs/2412.01441 🙂
Today we introduced Gemini Diffusion⚡️ (& DeepThink, Veo3, Imagen4, 2.5 updates...). It's been a dream of mine to remove the need for "left to right" text generation. It's so fast, that we had to *slow down* the video during the presentation. deepmind.google/models/gemini-…
The Gemini Diffusion release feels like a landmark moment. For text generation, autoregressive models have always outperformed diffusion models from a quality perspective. It wasn't clear that the gap could ever be closed. The team behind this have kept laser focused, broken…
What can be, unburdened by what has been 😇
A similar one inspired by the 'Sparks of AGI paper' by @SebastienBubeck et al: "How many primes are there between 150 and 250? The first thing you should output is the total number, then print the exact list inside [ ] brackets." (ans: 18) GPT-4o fails this one too:…
Super excited to have been part of the incredible journey with our team, bringing this to you all the way from research idea to Google IO!
We’ve developed Gemini Diffusion: our state-of-the-art text diffusion model. Instead of predicting text directly, it learns to generate outputs by refining noise, step-by-step. This helps it excel at coding and math, where it can iterate over solutions quickly. #GoogleIO
🔥 Gemini Diffusion is blazing fast 🔥 Honored to have been part of this amazing team!
We’ve developed Gemini Diffusion: our state-of-the-art text diffusion model. Instead of predicting text directly, it learns to generate outputs by refining noise, step-by-step. This helps it excel at coding and math, where it can iterate over solutions quickly. #GoogleIO
Come chat with @anianruoss @bonniesjli and me at our LMAct poster at the #ICLR25 workshop on Reasoning and Planning for LLMs (Garnet 212-213) to find out whether frontier models imitate expert behaviour purely in context!
LMs see, can LMs do? LMAct benchmarks current SOTA foundation models' ability to act in text/visual environments using text as low-level actions in many domains using in-context expert (multimodal) demonstrations. We're excited to see how this benchmark drives further progress!
We provide first insights on why prompting is hard. The training distribution matters a lot: if we don't know it (as in large language datasets), prompting is like shooting in the dark. Our results on prediction and in-context RL are intriguing! 1/n arxiv.org/pdf/2502.10760
Results of the second part of AIME 2025 are live on matharena.ai: Another convincing win for @openai's o3-mini 🥇 Great work by the entire MathArena team: @j_dekoninck, @ni_jovanovic and @IvoPetrov01!
We finally have an answer to the debate over whether LLMs generalize to new math problems or they merely memorized the answers. We evaluated them on the AIME 2025 I competition from *yesterday* and the results are good!
Check out the recent work by @anianruoss Eg openreview.net/forum?id=Xlpip…