Ian Shi
@ianshi3
(Almost former) Graduate Student @ Department of Computer Science, University of Toronto. Building something new at @blankbio_
We're excited to release 𝐦𝐑𝐍𝐀𝐁𝐞𝐧𝐜𝐡, a new benchmark suite for mRNA biology containing 10 diverse datasets with 59 prediction tasks, evaluating 18 foundation model families. Paper: biorxiv.org/content/10.110… GitHub: github.com/morrislab/mRNA… Blog: blank.bio/post/mrnabench

Why do models trained on mRNA outperform those trained on DNA? We found the sequence 'language' is fundamentally different. Our compression-based analysis quantifies this distributional shift, showing the regulatory code in mature mRNA is distinct from other genomic regions.
I am excited to introduce mRNABench, a comprehensive benchmarking suite that we used to evaluate the representational capabilities of 18 families of nucleotide foundation models on mature mRNA specific tasks. Paper: doi.org/10.1101/2025.0… Code: github.com/morrislab/mRNA… A 🧵
Can neural networks learn to map from observational datasets directly onto causal effects? YES! Introducing CausalPFN, a foundation model trained on simulated data that learns to do in-context heterogeneous causal effect estimation, based on prior-fitted networks (PFNs). Joint…
🚀 Problem: Language models struggle with rapidly evolving info and context in fields like medicine & finance. We need ways to post-train LLMs to control how they absorb new knowledge. 🔍 Insight: Why not explain, and teach, LLMs how to learn? @YounwooC will be at #ICLR2025…
Announcing Evo 2: The largest publicly available, AI model for biology to date, capable of understanding and designing genetic code across all three domains of life. arcinstitute.org/manuscripts/Ev…
New preprint claims that most existing DNA language models perform just as well with random weights, suggesting that pretraining does nothing (Mistral & DNABERT-2 look like exceptions). We need better DNA language models.
Phil (@phil_fradkin) and I will be presenting Orthrus (biorxiv.org/content/10.110…) as a spotlight poster at the Workshop on AI for New Drug Modalities at #NeurIPS2024! Our poster will be up starting 11:40AM in West Meeting Room 109, 110. Excited to be sharing some new results!