Johnny Yu
@iamjohnnyyu
CSO and co-founder @tahoe_ai (formerly Vevo), UCSF PhD. Building the next big thing in single-cell drug discovery!
1/ 🧵 THREAD: Here’s how we solved a 30-year mystery about an HIV drug. 💊 A drug used for over two decades ⚠️ Pulled from the market for cardiac risk 🤯 And no one knew why — until now We found the missing mechanism using our single-cell mosaic dataset, Tahoe-100M. Let’s talk…
On the floor of NYSE for AI x BIO 2025! Thanks @ameekapadia @pablolubroth for the awesome energy and hosting such a talented crowd. 🧬🚀 @nalidoust

2/N We evaluated our models on the 2025 IMO problems under the same rules as human contestants: two 4.5 hour exam sessions, no tools or internet, reading the official problem statements, and writing natural language proofs.
We love it when others build dev tools for building on Tahoe-100M. Today, we highlight scDataset in our @tahoe_ai blog. Developed by @davide_dascenzo & @sebacultrera, it makes it easier to train DL models on datasets with the daunting scale of Tahoe-100M (see poster in ICML '25)
Today, @NIH announced a new policy to cap how much publishers can charge NIH-supported scientists to make their work publicly accessible. This reflects our broader effort to restore public trust in public health by creating an open, honest, and transparent research atmosphere.…
Completely agree. Pre-training on quality data for the first time allows for context generalization - > an important milestone in Virtual Cell Not all data is created equal! 🖥️🧬
The result I'm most excited about from Arc's new State model: The ability to generalize on zero-shot out-of-distribution predictions after pre-training on the TAHOE-100M data set. Whereas PLMs have seemingly benefitted less from scaling data and model size, this is an inkling…
I f*** love this. And to put even more momentum behind this awesome movement by our @arcinstitute friends: 🥁@tahoe_ai will give $25K to the best model that is also open-sourced with weights on @huggingface for everyone's use! + maybe a mention on X by @ClementDelangue? ;)
Register today for the Virtual Cell Challenge and use AI to solve one of biology’s most complex problems. Announced in @CellCellPress, the competition is hosted by Arc Institute and sponsored by @nvidia, @10xGenomics, and @UltimaGenomics.
For anyone interested in where virtual cell is going - this is a very timely and landmark piece of work. A few take aways: - Data matters! 💾 High quality, unified datasets like Tahoe100M vastly improve model performance - Embeddings are getting good enough for tasks 🔨 - with…
Today @arcinstitute releases State, our first perturbation prediction AI model and an important step towards our goal of a virtual cell State is designed to learn how to shift cells between states (e.g. “diseased” to “healthy”) using drugs, cytokines, or genetic perturbations
How is it going to be the first when Noetik has already announced? noetik.ai/octo-vc And CZI biohub chanzuckerberg.com/science/techno…
This short video says it all - it's writing, running, and visualizing the full bioinformatics pipeline just like a computational biologist!
So we teamed up with Kepler AI @keplogic to build TahoeDive: A natural language interface that enables biologists to do analysis that requires a team of bioinformaticians: dig into Tahoe-100M — and any datasets derived from it — alongside key insights from the literature. 🔍
Join the waitlist, it'll fill up quickly! Full thread here for TahoeDive 👇
Today, we @tahoe_ai are announcing TahoeDive, built in partnership with Kepler AI @keplogic + opening beta access! TahoeDive is an AI agent for biologists to query & analyze our Tahoe-100M dataset using natural language, with added context from broader scientific literature. 🧵
Excited to launch the waitlist for TahoeDive, our first AI agent built on Tahoe-100M - check it out, I expect this waitlist to fill up quickly.
We’re excited to invite a select group of biologists to try it and share feedback. Request beta access, read more, and see example demos here 👉 tahoebio.ai/tahoedive Let us know what you think 💬
🧵1/ We @tahoe_ai just published a new post on the Tahoe blog—a story of how we used Tahoe-100M, the world’s largest drug-perturbed single-cell dataset, to find compounds that upregulate MHC-I and make tumors more visible to the immune system. Here’s how 🧬🔍👇
It was great to give a talk to the community about Tahoe-100M! Big thanks to our partners in science @ParseBio and @UltimaGenomics for making projects like Tahoe the new gold standard in single cell. More to come stay tuned 😉🎙️#tahoe100m #singlecell
Thank you to everyone who joined our webinar this week as we took a closer look at the Tahoe-100M data set by the team at @tahoe_ai (formerly Vevo) and in collaboration with Parse's GigaLab. Big thanks to our speaker, @iamjohnnyyu Watch on-demand: parse.bio/3FmfoVc
I am excited to announce the winners of our @tahoe_ai Tahoe Deep Dive hackathon with @huggingface. We created and released Tahoe-100M to start a movement and a community. And Tahoe Deep Dive Hackathon was the first occasion for this community to get together physically and…
“The most talent-dense set of teams I’ve ever seen at a demo day” 💅
1/ just hosted demo day for the 4th batch of Embed we’re proud to get to be on the journey with all of them and excited to share all of them with you today!
This was most excellent Amazing founders here today.
.@conviction Embed demo day T-30 minutes
The @tahoe_ai hackathon just wrapped up! Such an inspiring environment with great energy, and genuinely fun people. Huge kudos to the @tahoe_ai team for organizing it so well. Extra proud to see the @scverse_team members in the teams that took all of the top 3 spots! What’s next!
Team Dawo for the win! Very impressive work 📊
Excited to share that my team and I won 2nd at the @tahoe_ai Deep Dive Hackathon!! We used Tahoe-100M to build a VAE workflow that can predict what drug caused a cell to go from its original state to its perturbed state. Thank you @nalidoust and the @tahoe_ai team!