Giannis Daras
@giannis_daras
MIT CSAIL Postdoc 👨🎓 Ph.D. Computer Science @UTAustin 👨💻 Ex: @nvidia, @google, @explosion_ai, @ntua
Announcing Ambient Diffusion Omni — a framework that uses synthetic, low-quality, and out-of-distribution data to improve diffusion models. State-of-the-art ImageNet performance. A strong text-to-image results in just 2 days on 8 GPUs. Filtering ❌ Clever data use ✅

What a pleasant end to #ICML2025 to win the best paper at @genbio_workshop with the dream team for our paper FORT: Forward only regression training of Normalizing flows 🌊
Wrapping up #ICML2025 on a high note — thrilled (and pleasantly surprised!) to win the Best Paper Award at @genbio_workshop 🎉 Big shoutout to the team that made this happen! Paper: Forward-Only Regression Training of Normalizing Flows (arxiv.org/abs/2506.01158) @Mila_Quebec
1/ Where do Probabilistic Models, Sampling, Deep Learning, and Natural Sciences meet? 🤔 The workshop we’re organizing at #NeurIPS2025! 📢 FPI@NeurIPS 2025: Frontiers in Probabilistic Inference – Learning meets Sampling Learn more and submit → fpiworkshop.org…
Protein structure prediction contest CASP gets temporary funding from Google DeepMind as NIH grant runs out. trib.al/bGoz7lf
x.com/statnews/statu…
Protein structure prediction contest CASP gets temporary funding from Google DeepMind as NIH grant runs out. trib.al/bGoz7lf
That’s so funny
we’re not kfc but come watch us cook with our feynman-kac correctors, 4:30 pm today (july 16) at @icmlconf poster session — east exhibition hall #3109 @k_neklyudov @AlexanderTong7 @tara_aksa @OhanesianViktor
Great work from my labmate @ShivamDuggal4
Compression is the heart of intelligence From Occam to Kolmogorov—shorter programs=smarter representations Meet KARL: Kolmogorov-Approximating Representation Learning. Given an image, token budget T & target quality 𝜖 —KARL finds the smallest t≤T to reconstruct it within 𝜖🧵
GPT3: scale compute by 10x to get a good model Grok-4: scale RL compute by 10x to get a good model Llama-5: scale employee comp by 10x to get a good model.
Joint work with the amazing Jeffrey Zhang (@zhang_ouyang) (equal contribution) and w. wonderful people: D. Diaz (@aiproteins), K. Ravishankar, W. Daspit, A. Klivans, C. Daskalakis (@KonstDaskalakis), Q. Liu. It's also my first paper in the proteins space, so show it some love!
Ambient Protein Diffusion treats low pLDDT AF structures as low-quality data. Instead of filtering them out (as done in prior work), we use them for a subset of the diffusion times. Enough noise "erases" the AF mistakes, and we can still learn from those structures.
The results are quite strong. Ambient Protein Diffusion substantially outperforms previous baselines in short and long protein generation. For short proteins, we dominate the Pareto frontier between designability and diversity, using a ~13x smaller model than previous SOTA.
Ambient Proteins: Training Diffusion Models on Low Quality Structures 1. A new framework, Ambient Protein Diffusion, revolutionizes protein structure generation by leveraging low-confidence AlphaFold structures as valuable, corrupted training data instead of discarding them.…
Ambient Proteins: Training Diffusion Models on Low Quality Structures 🤔 biorxiv.org/content/10.110…