Hannes Stärk
@HannesStaerk
@MIT PhD student • ML for molecules and biology https://bsky.app/profile/hannes-stark.bsky.social
New paper (and #ICLR2025 Oral :)): ProtComposer: Compositional Protein Structure Generation with 3D Ellipsoids arxiv.org/abs/2503.05025 Condition on your 3D layout (of ellipsoids) to generate proteins like this or to get better designability/diversity/novelty tradeoffs. 1/6
Excited to share: “Learning Diffusion Models with Flexible Representation Guidance” With my amazing coauthors @zhuci19, @sharut_gupta, @zy27962986, @StefanieJegelka, @stats_stephen, Tommi Jaakkola Paper: arxiv.org/pdf/2507.08980 Code: github.com/ChenyuWang-Mon…
Presenting La-Proteina! A new model for scalable, all-atom protein design 🧬 Backbone + sequence + side-chains, indexed and unindexed atomistic motif scaffolding, scalable up to 800 residues, and more… A thread 🧵
📢📢 "La-Proteina: Atomistic Protein Generation via Partially Latent Flow Matching" Fully atomistic. Partially latent. Structurally precise. Entirely generative. w/ @tomasgeffner*, @DidiKieran*, et al. 📜 Project page & paper: research.nvidia.com/labs/genair/la… 🧵 Thread below... (1/n)
We also presented this work in @HannesStaerk's reading group. Go check it out if interested! (youtu.be/0r25eXy-Bgc?si…) Code coming soon! (12/12)
My labmate, officemate, and co-author @felix_faltings will be at #ICML2025 presenting ProxelGen in the @genbio_workshop ! Anyone interested in discussing ProxelGen, ProtFID, or our current biomolecular design work? (ProxelGen link arxiv.org/pdf/2506.19820)
Unfortunately, not able to attend #ICML2025 this year, but happy to share our accepted paper: 𝐒𝐏𝐇𝐈𝐍𝐗: 𝐒𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐚𝐥 𝐏𝐫𝐞𝐝𝐢𝐜𝐭𝐢𝐨𝐧 𝐮𝐬𝐢𝐧𝐠 𝐇𝐲𝐩𝐞𝐫𝐠𝐫𝐚𝐩𝐡 𝐈𝐧𝐟𝐞𝐫𝐞𝐧𝐜𝐞 𝐍𝐞𝐭𝐰𝐨𝐫𝐤 w/ @pl219_Cambridge arxiv.org/pdf/2410.03208 1/2
🚧 Important warning about novelty computation in prior work 🚧 While working on the paper, we realized that a FoldSeek bug has affected novelty numbers in prior works. If you are using FoldSeek to report novelty, we strongly recommend using FoldSeek 10 onwards.
Announcing Ambient Protein Diffusion, a state-of-the-art 17M-params generative model for protein structures. Diversity improves by 91% and designability by 26% over previous 200M SOTA model for long proteins. The trick? Treat low pLDDT AlphaFold predictions as low-quality data
Had a lot of fun learning diffusion and addressing key issues in protein diffusion with @giannis_daras @zhang_ouyang TLDR: a few protein structure insights inspired us to design a new diffusion loss, training regime and dataset, resulting in significant performance improvements
Announcing Ambient Protein Diffusion, a state-of-the-art 17M-params generative model for protein structures. Diversity improves by 91% and designability by 26% over previous 200M SOTA model for long proteins. The trick? Treat low pLDDT AlphaFold predictions as low-quality data
Announcing Ambient Protein Diffusion, a state-of-the-art 17M-params generative model for protein structures. Diversity improves by 91% and designability by 26% over previous 200M SOTA model for long proteins. The trick? Treat low pLDDT AlphaFold predictions as low-quality data
Starkly Speaking tomorrow: @bwood_m will present "UMA: A Family of Universal Models for Atoms" ai.meta.com/research/publi… Join us on Zoom 12pm ET / 6pm CEST: portal.valencelabs.com/starklyspeaking

Instead, we represent proteins as 3D densities sampled on a discrete grid. Like image pixels, we're calling this representation *proxels* 🙃 (3/8)
📌Notes on Boltz-2 Just watched the video talk led by @GabriCorso @jeremyWohlwend, and Saro Passaro that introduced Boltz-2, a structural biology foundation models. I summarized some learning notes below 🧵
Couldn't summarize the affinity prediction part yet, saving for later. Boltz-2 Paper: biorxiv.org/content/10.110… Code: github.com/jwohlwend/boltz Talk: youtube.com/watch?v=iHDauM…
Very happy about our new ProxelGen piece! We explore generating proteins represented as densities instead of the standard 3D point clouds. I am excited to see what this can enable in terms of training on broader density data in the future 👌
Excited to share our new ProxelGen paper! Completely different from RFDiffusion etc., we generate proteins as densities instead of point clouds. Turns out this works just as well and e.g. does better on some scaffolding tasks. arxiv.org/abs/2506.19820 (1/8)
One thing that really bothers me with the new "virtual cell" terminology is that is currently largely focused on a very narrow definition of models that can predict effects of trans perturbations (gene dosage, drugs etc) on gene expression. 1/
Protein FID: Improved Evaluation of Protein Structure Generative Models 1.This paper proposes Protein FID, a new evaluation metric for generative protein structure models, addressing key limitations of current metrics like designability, diversity, and novelty, which often…
38/ Next paper in this series is #Boltz2 paper that described an open-source AI model that predicts both 3D protein structures and binding affinities — approaching the gold-standard FEP simulations, but 1000x faster ⚡ Why does this matter? Binding affinity = how tightly a…