Yusuf Roohani
@yusufroohani
Machine Learning & Systems Biology. ML Group Leader @arcinstitute. PhD @StanfordAILab
Cells are dynamic, messy and context dependent. Scaling models across diverse states needs flexibility to capture heterogeneity Introducing State, a transformer that predicts perturbation effects by training over sets of cells Team effort led by the unstoppable @abhinadduri

Dropped the Virtual Cell Challenge Primer on HF. We are shipping transformers support for STATE (the SOTA model for predicting perturbation response) very soon!
In 1h we discuss "Predicting cellular responses to perturbation across diverse contexts with State" from @arcinstitute in the reading group with the author @abhinadduri! biorxiv.org/content/10.110… Join us on zoom at 9am PT / 12pm ET: portal.valencelabs.com/starklyspeaking
🧬 Excited to open-source Biomni! With just a few lines of code, you can now automate biomedical research with AI agent! We are releasing Biomni A1 (agent) + E1 (env) with 150 specialized tools, 59 databases, and 105 software. E1 is our first attempt at curating the bio-agent…
Our Virtual Cell Challenge commentary is one of the most read @CellCellPress articles for the last month, alongside Weinberg's classic on the hallmarks of cancer
We updated the State Embedding 600M checkpoint on the @ArcInstitute Hugging Face This model was trained with 4x FLOPs compared to the preprint model. It achieves significantly lower val/loss and does better on internal evals - would recommend using this over the 4 epoch one for…
Last week @arcinstitute released the Virtual Cell Challenge 🧬 The goal is to train a model capable of simulating a cell. I wrote a primer for engineers without a biology background.
“[We have] a much simpler goal: make the existing models good enough that experimentalists adopt and use them. Like the “GPT Moment,” this may not require any semblance of perfection.” Was a pleasure speaking with Elliott! Great take on the field and how State is changing things
What Are Virtual Cells? centuryofbio.com/p/virtual-cell Two months ago, I started working on an essay to answer this question. The goal was to cover a few of the recent research results. Simple as that! Instead, I went down a rabbit hole exploring ideas around cellular simulation…
One-month update of Biomni⬇️ Excited to see how Biomni has automated 15K+ research tasks for biologists!
🧬 1 month update of Biomni: the general-purpose biomedical AI agent! 🌍 Scientists from 2K+ organizations in 76 countries registered 🤖 15K+ research tasks automated — saving millions of biologists hours 💻 12M+ lines of code written 🔥 3B+ tokens burned 🧬 Spanned across…
Tahoe-100M FTW 💪
The result I'm most excited about from Arc's new State model: The ability to generalize on zero-shot out-of-distribution predictions after pre-training on the TAHOE-100M data set. Whereas PLMs have seemingly benefitted less from scaling data and model size, this is an inkling…
Thanks for the shoutout @ElliotHershberg! This was one of the most exciting parts for us as well. We also found that when using cell embeddings, pre-training on Tahoe-100M improved our zeroshot transfer on genetic or signaling datasets!
The result I'm most excited about from Arc's new State model: The ability to generalize on zero-shot out-of-distribution predictions after pre-training on the TAHOE-100M data set. Whereas PLMs have seemingly benefitted less from scaling data and model size, this is an inkling…
We’re also excited about this result! Zero-shot prediction showed clear value of embeddings: - Pretraining State on Tahoe-100M - Fully fine-tuning on smaller, noisier datasets - Led to more accurate perturbation ranking prediction than mean baselines or HVG-trained State models
The result I'm most excited about from Arc's new State model: The ability to generalize on zero-shot out-of-distribution predictions after pre-training on the TAHOE-100M data set. Whereas PLMs have seemingly benefitted less from scaling data and model size, this is an inkling…
The result I'm most excited about from Arc's new State model: The ability to generalize on zero-shot out-of-distribution predictions after pre-training on the TAHOE-100M data set. Whereas PLMs have seemingly benefitted less from scaling data and model size, this is an inkling…
So cool: @tahoe_ai is giving $25K (on top of the $100k + $50k + $25k) to the best open-source model on @huggingface for the @arcinstitute's Virtual Cell Challenge! Open-source + AI + biology = 🔥🔥🔥
Enjoyed talking with @xiaofei_lin from @GENbio about State, our latest AI model for predicting cellular responses to perturbation across diverse contexts
.@arcinstitute has announced the inaugural “Virtual Cell Challenge,” sponsored by @nvidia, @10xGenomics, and @UltimaGenomics, which will evaluate the ability of AI models to generalize to new cell contexts for therapeutic applications. 1/ genengnews.com/topics/artific…
Getting to the virtual cell is the holy grail of life science. Now there's a challenge on to accelerate it cell.com/cell/fulltext/… @arcinstitute @CellCellPress @nvidia @10xGenomics @yusufroohani @StanfordAILab @davey_burke @UltimaGenomics