Garyk Brixi
@garykbrixi
PhD student at Stanford Genetics. DNA BERTologist
In 1965, Margaret Dayhoff published the Atlas of Protein Sequence and Structure, which collated the 65 proteins whose amino acid sequences were then known. Inspired by that Atlas, today we are releasing the Dayhoff Atlas of protein sequence data and protein language models.
Evo 2 update: new dependency versions (torch, transformer engine, flash attn) and a docker option mean it should be easy to setup without needing to compile locally. Happy ATGC-ing! github.com/ArcInstitute/e…
Staff scientist position (computational): I am looking for a computational scientist to join my genomics lab at Stanford. They should have an outstanding skillset in ML/statistical methods for genomic applications, postdoc experience and a strong publication record. #sciencejobs
Delighted to announce @arcinstitute's Virtual Cell Challenge - a recurring, open, community-driven challenge to benchmark cellular foundation models See our announcement in @CellCellPress below, with prizes up to $100,000, sponsored by @Nvidia @10xGenomics @UltimaGenomics!
Register today for the Virtual Cell Challenge and use AI to solve one of biology’s most complex problems. Announced in @CellCellPress, the competition is hosted by Arc Institute and sponsored by @nvidia, @10xGenomics, and @UltimaGenomics.
Excited to share #AlphaGenome, a start of our AlphaGenome named journey to decipher the regulatory genome! The model matches or exceeds top-performing external models on 24 out of 26 variant evaluations, across a wide range of biological modalities.1/6
Got several responses exploring which software can deal with this. My point was that students should not rely on the precision of any standard software/functions but should learn how to maintain precision and avoid underflow/overflow before starting to code.
Quantitative graduate students should start by studying this. Would save them endless frustrations.
🚨ICML Paper Alert🚨 What if finding the right protein homologs wasn't a slow search, but a learned part of the model itself? We introduce 𝐏𝐫𝐨𝐭𝐫𝐢𝐞𝐯𝐞𝐫, an end-to-end framework that learns to retrieve the most useful homologs for self-supervised reconstruction! (1/12)
🚨 New paper 🚨 RNA modeling just got its own Gym! 🏋️ Introducing RNAGym, large-scale benchmarks for RNA fitness and structure prediction. 🧵 1/9
🚀 Excited to release BoltzDesign1! ✨ Now with LogMD-based trajectory visualization. 🔗 Demo: rcsb.ai/ff9c2b1ee8 Feedback & collabs welcome! 🙌 🔗: GitHub: github.com/yehlincho/Bolt… 🔗: Colab: colab.research.google.com/github/yehlinc… @sokrypton @MartinPacesa
From Likelihood to Fitness: Improving Variant Effect Prediction in Protein and Genome Language Models 1.This study introduces Likelihood-Fitness Bridging (LFB), a method that improves variant effect prediction in protein and genome language models (pLMs and gLMs) by averaging…
excited to finally share on arxiv what we've known for a while now: All Embedding Models Learn The Same Thing embeddings from different models are SO similar that we can map between them based on structure alone. without *any* paired data feels like magic, but it's real:🧵
this is sick all i'll say is that these GIFs are proof that the biggest bet of my research career is gonna pay off excited to say more soon
Genomes encode biological complexity, which is determined by combinations of DNA mutations across millions of bases In new @arcinstitute work, we report the discovery and engineering of the first programmable DNA recombinases capable of megabase-scale human genome rearrangement
What if we could universally recombine, insert, delete, or invert any two pieces of DNA? In back-to-back @Nature papers, we report the discovery of bridge RNAs and 3 atomic structures of the first natural RNA-guided recombinase - a new mechanism for programmable genome design
Reading group tomorrow: @json_yim and @woodyahern present "Atom level enzyme active site scaffolding using RFdiffusion2" biorxiv.org/content/10.110… Join on Zoom at 9am PT / 12pm ET / 6pm CEST: portal.valencelabs.com/starklyspeaking
🚨 New in @ImmunityCP ! EVE-Vax, an AI model that anticipates future viral evolution and designs antigens to proactively test vaccines + therapeutics—before variants even emerge. We envision this work will help make future-proofed vaccines and therapeutics. 👇 (1/7)
Thrilled that our work "Sidechain conditioning and modeling for full-atom protein sequence design with FAMPNN" has been accepted to ICML 2025! 🥳 Looking forward to connecting with the ML and comp bio communities in Vancouver this July! :)
Excited to share our joint work with @richardwshuai, Full-Atom MPNN (FAMPNN), a protein sequence design method that explicitly models both sequence and side-chain structure! 🧵 1/N
Thanks Abhi @owl_posting for a wonderful conversation! It was a an honor to be the 3rd guest on the show. 😎
What could Alphafold 4 look like? (Sergey Ovchinnikov, Ep #3) 2 hours listening time (links below) To those in the (machine-learning for protein design) space, Dr. Sergey Ovchinnikov (@sokrypton) is a very, very well-recognized name. A recent MIT professor (circa early…
Super excited to present Dyna-1 next week with @HWaymentSteele !!
Next Tues (4/29) at **4:30PM** ET, we will have @ginaelnesr @HWaymentSteele present "Learning millisecond protein dynamics from what is missing in NMR spectra" Paper: biorxiv.org/content/10.110… Sign up on our website for zoom links!
It was awesome to join the @sequoia podcast with @josephinekchen and @gradypb to chat about modeling biology!
Biology has been “guess and check” for too long The missing piece? Predictive models! @pdhsu, co-founder of @ArcInstitute, is using AI to transform drug discovery