Luca Ambrogioni
@LucaAmb
Ass. prof. of Machine Learning. PI of Generative Memory Lab (@DondersInst). Statistical physics, generative diffusion, memory, and generalization.
Consistency Variational Autoencoders (CoVAE) follow naturally from β-VAEs. A family of β-VAEs (with increasing β) can be organized as a sequence of latent encodings with decreasing SNR . This implicit definition of a 'forward process' is used to define a consistency-style loss!

Transformers haven't changed much since 2017, but there have been some innovations over the years. This is an excellent summary of architectural differences in recent LLMs. Nice diagrams too! 👏 It would be great to see something like this for diffusion Transformers as well 🤔
From GPT to MoE: I reviewed & compared the main LLMs of 2025 in terms of their architectural design from DeepSeek-V3 to Kimi 2. Multi-head Latent Attention, sliding window attention, new Post- & Pre-Norm placements, NoPE, shared-expert MoEs, and more... magazine.sebastianraschka.com/p/the-big-llm-…
Apply for the AITHYRA-CeMM International PhD Program! 15-20 fully funded PhD fellowships available in Vienna in AI/ML and Life Sciences Deadline for applications: 10 September 2025 apply.cemm.at
Math Olympiads are a very easy benchmark for LLMs There is tons of nearly identical training data available, and it has a clear unambiguous solutions that can be used for RL Most problems can be solved largely by memory and trial-and-error Doesn't generalize to real math
You’ve heard of water turning into steam. But have you heard of hot gas turning into a black hole? Meet the Hawking–Page transition 🧵
📢 Excited to announce that GenMol is now open-sourced. GenMol: A Drug Discovery Generalist with Discrete Diffusion Paper: arxiv.org/abs/2501.06158 Code: github.com/NVIDIA-Digital…
🚀 GenMol is now open‑sourced: you can now train and finetune on your data! It uses masked diffusion + a fragment library to craft valid SAFE molecules, from de novo design to lead optimization. #GenMol #DrugDiscovery #Biopharma
I’m building a new team at @GoogleDeepMind to work on Open-Ended Discovery! We’re looking for strong Research Scientists and Research Engineers to help us push the frontier of autonomously discovering novel artifacts such as new knowledge, capabilities, or algorithms, in an…
How to build a factual but creative system? It is a question surrounding memory and creativity in modern ML systems. My colleagues from @IBMResearch and @MITIBMLab are hosting the @MemVis_ICCV25 workshop at #ICCV2025, which explores the intersection between memory and generative…
Interesting approach! However, we looked at the proofs and methodology and we found a few problems, specifically with the use of hints given to the model. While the scaffold indeed improves performance, it does not solve all problems accurately and would not get a gold medal.🧵
🚨 Olympiad math + AI: We ran Google’s Gemini 2.5 Pro on the fresh IMO 2025 problems. With careful prompting and pipeline design, it solved 5 out of 6 — remarkable for tasks demanding deep insight and creativity. The model could win gold! 🥇 #AI #Math #LLMs #IMO2025
🌞🌞🌞 The third Structured Probabilistic Inference and Generative Modeling (SPIGM) workshop is **back** this year with @NeurIPSConf at San Diego! In the era of foundation models, we focus on a natural question: is probabilistic inference still relevant? #NeurIPS2025
Class act from Google DeepMind. Much respect 🫡
Huge thanks to all my friends and advisors who helped me develop this work. Specifically, this paper would never have happened without @wellingmax's guidance. See the blog for an intro, and the paper for all the proofs! Blog: kempnerinstitute.harvard.edu/research/deepe… Code: github.com/akandykeller/F…
The problem comes in when you believe that demographic diversity is of such overriding importance that it requires suppression of the diversity of ideas, which what is actually core to the scientific endeavor.
To retreat from diversity, equity, and inclusion as a core aspect of the scientific endeavor is to close the door to possibilities for better science and a better future.
I have moved to substack. This is my first post, based on a couple of threads I did recently. Link below.
Are you studying how structure shapes computation in the brain and in AI systems? 🧠 Come share your work in San Diego at NeurReps 2025! There is one month left until the submission deadline on August 22: neurreps.org/call-for-papers
We are hiring on the Veo team!📽️ Some people asked me about this at #ICML2025. If that's you, I will have told you to check deepmind.google/careers/ regularly. 👀It's just been updated: Europe (London, Zurich) job-boards.greenhouse.io/deepmind/jobs/… US (Mountain View) job-boards.greenhouse.io/deepmind/jobs/…
Want to be part of a team redefining SOTA for generative video models? Excited about building models that can reach billions of users? The Veo team is hiring! We are looking for amazing researchers and engineers, in North America and Europe. Details below:
Google DeepMind followed IMO rules to earn gold, unlike OpenAI
Can open-data models beat DINOv2? Today we release Franca, a fully open-sourced vision foundation model. Franca with ViT-G backbone matches (and often beats) proprietary models like SigLIPv2, CLIP, DINOv2 on various benchmarks setting a new standard for open-source research🧵
thirty 30 years ago my very first journal article get accepted without revision. i remember thinking "this publishing thing isn't that bad" - it’s been downhill ever since.
stealing the spotlight from kids just to hype yourselves is not a good look
🚨 According to a friend, the IMO asked AI companies not to steal the spotlight from kids and to wait a week after the closing ceremony to announce results. OpenAI announced the results BEFORE the closing ceremony. According to a Coordinator on Problem 6, the one problem OpenAI…
We are hiring! If you are interested in efficient architecture or making training and inference on thousands of GPUs much faster, please feel free to dm me or @WeizhuChen! We are doing RL on very large scales!
We’re open-sourcing the pre-training code for Phi4-mini-Flash, our SoTA hybrid model that delivers 10× faster reasoning than Transformers — along with μP++, a suite of simple yet powerful scaling laws for stable large-scale training. 🔗 github.com/microsoft/Arch… (1/4)