Brian Hie
@BrianHie
AI for biology @Stanford and @arcinstitute
We trained a genomic language model on all observed evolution, which we are calling Evo 2. The model achieves an unprecedented breadth in capabilities, enabling prediction and design tasks from molecular to genome scale and across all three domains of life.

There is no rest when training models at scale. You spend some days in an industrial forge, sparks flying as the machines clang and sputter. You spend other days in the operating room, performing a delicate surgery for a newly discovered condition, hoping the patient recovers
Introducing OpenAI o3 and o4-mini—our smartest and most capable models to date. For the first time, our reasoning models can agentically use and combine every tool within ChatGPT, including web search, Python, image analysis, file interpretation, and image generation.
Very cool work on scaling data for protein language modeling, congrats to the team!
In 1965, Margaret Dayhoff published the Atlas of Protein Sequence and Structure, which collated the 65 proteins whose amino acid sequences were then known. Inspired by that Atlas, today we are releasing the Dayhoff Atlas of protein sequence data and protein language models.
Should now be much easier to install and run Evo 2
Evo 2 update: new dependency versions (torch, transformer engine, flash attn) and a docker option mean it should be easy to setup without needing to compile locally. Happy ATGC-ing! github.com/ArcInstitute/e…
"The body of data available in protein sequences is something fundamentally new in biology and biochemistry, unprecedented in quantity, in concentrated information content and in conceptual simplicity." - Margaret Dayhoff describing my research better than me before I was born
Stanford biochemist Lingyin Li is studying a tumor-fighting “miracle molecule” that could one day inform therapies for cancer, as well as autoimmune, neurodegenerative, and age-related diseases: stanford.io/3I5xqfx
Register for this challenge and congrats to the team for making this available to the community!
Register today for the Virtual Cell Challenge and use AI to solve one of biology’s most complex problems. Announced in @CellCellPress, the competition is hosted by Arc Institute and sponsored by @nvidia, @10xGenomics, and @UltimaGenomics.
Cells are dynamic, messy and context dependent. Scaling models across diverse states needs flexibility to capture heterogeneity Introducing State, a transformer that predicts perturbation effects by training over sets of cells Team effort led by the unstoppable @abhinadduri
Introducing Arc Institute’s first virtual cell model: STATE
🎉New preprint!🎉 Extremely excited to share CryoBoltz❄️⚡️, led by superstar @rishwanth_raghu! We develop a multiscale guidance recipe to steer structure prediction models (e.g. AlphaFold3 / Boltz-1) towards experimental cryo-EM density maps, including heterogeneous,…
Excited to present CryoBoltz ❄️⚡, a multiscale guidance approach for steering AlphaFold3/Boltz-1 to sample structures that are consistent with experimental cryo-EM density maps. 🧵1/7 arxiv.org/abs/2506.04490 Joint work with @axlevy0 @GordonWetzstein & @ZhongingAlong!
Today is my first day as Writer-in-Residence at @arcinstitute I'll be writing about the Virtual Cell, genome editing + much more. Grateful for the invitation to spend my summer here and learn more about AI+Bio!
Over the last 2 weeks, I took a deep dive into Evo 2, Arc's Genomic Foundation model. But, I couldn't find a crisp primer on Evo 2 that covered the decisions for the ML architecture, the inference-time scaling results or the mechanistic interpretability results. So, I wrote one!
Announcing Evo 2: The largest publicly available, AI model for biology to date, capable of understanding and designing genetic code across all three domains of life. arcinstitute.org/manuscripts/Ev…
This is beautiful work, congrats to the team!
Wanted to highlight our latest preprint--a huge effort by multiple people and labs, but led primarily by Will DeWitt (UW) and Tatsuya Araki and Ashni Vora (our lab), in a very close wet-dry collaboration with @ematsen ’s lab at the Hutch biorxiv.org/content/10.110…
.@ArcInstitute is redefining how biomedical research is conducted. Looking to harness body-brain communication to counteract human disease, our lab is exploring new approaches through shared tools & collaboration. Read more in our Q&A. arcinstitute.org/news/news/chri…
For neurodegenerative diseases like Alzheimer’s and Parkinson’s, treatment options are scant. New research identifies a promising access point to therapeutics. news.stanford.edu/stories/2025/0…
🥳
So pleased to share that the Executive Committee of MIT has approved the promotion of Bryan Bryson to Associate Professor with Tenure.
Thrilled that our work "Sidechain conditioning and modeling for full-atom protein sequence design with FAMPNN" has been accepted to ICML 2025! 🥳 Looking forward to connecting with the ML and comp bio communities in Vancouver this July! :)
Excited to share our joint work with @richardwshuai, Full-Atom MPNN (FAMPNN), a protein sequence design method that explicitly models both sequence and side-chain structure! 🧵 1/N
Catch @talaldotpdb and @richardwshuai at ICML!
Thrilled that our work "Sidechain conditioning and modeling for full-atom protein sequence design with FAMPNN" has been accepted to ICML 2025! 🥳 Looking forward to connecting with the ML and comp bio communities in Vancouver this July! :)
I'm joined with @AnnaMarieWagner, a real expert in AI and bio, and we’re talking about some of the incredible speakers coming to #SynBioBeta2025 this year. One of the people we are especially excited about is @BrianHie from the @arcinstitute. Brian is doing groundbreaking work…
What could scaling unlock for biology? Introducing ProGen3- our next AI foundation models for protein generation. We develop compute-optimal scaling laws up to 46B parameters on 1.5T tokens with real evidence in the wet lab. +we solve a new set of challenges for drug discovery
Is the genome just a bag of genes? A new paper in @ScienceMagazine now reports that for two thirds of an organisms' genes the position along the chromosome is actually very tightly constrained! Amazing work from my favorite night scientist @MartinJLercher and his team!