Peter Koo
@pkoo562
Associate Professor @CSHL, advancing deep learning for genomics
The premier conference on Machine Learning for Computational Biology is Sep 9-10 at the NY Genome Center in NYC! Submission deadline is June 1 for 2-page abstracts and 8-page papers (eligible for proceedings track). Registration is now open! (Link below) Please retweet!

🚀 Join CongLab @Stanford! We’re hiring postdocs to create lab-in-the-loop self-evolving AI agents, open benchmarks, to design, test & learn—advancing safer gene & cell therapies. Build on CRISPR-GPT, RNAGenesis model, Genome-Bench, for innovative medicines. #PostdocJobs
I'm excited to share work on a research direction my team has been advancing: connecting machine learning derived genetic variant embeddings to downstream tasks in human genetics. This work was led by the amazing @divyanshi91! biorxiv.org/content/10.110…
BTW this work was originally presented at ICML Workshop on Comp Bio in 2023. That’s 2 years ago!!! This kind of critical but fair view should not come to light this late, especially when so many gLMs are being released every week!
Our work on "Evaluating the representational power of pre-trained DNA language models for regulatory genomics" led by @AmberZqt with help from @NiraliSomia & @stevenyuyy is finally published in Genome Biology! Check it out! genomebiology.biomedcentral.com/articles/10.11…
We observed the same across six promoter variant effect benchmarks. Evo2 did worse than basic CNNs and had ~0.5 auROC on some tasks.
*Easter egg alert* NOT in the published paper. We also benchmarked Evo 2 and while it did better than other gLMs (consistent that scale can improve gLMs), it still falls short of a basic CNN trained using one-hot sequences and far short of supervised SOTA x.com/pkoo562/status…
we're recruiting for a scientist to join our Talent team @newlimit this is a unique chance for a scientist to move into the business & ops side of biotech. we imagine recent PhDs & postdocs looking to transition away from the bench are great candidates. you'll be working very…
Practically useful & biologically aligned benchmarks such as this one from @pkoo562 lab consistently show that all the overhyped annotation-agnostic DNA language models are actually terrible for transcriptional regulatory DNA in humans (mammals). 1/
*Easter egg alert* NOT in the published paper. We also benchmarked Evo 2 and while it did better than other gLMs (consistent that scale can improve gLMs), it still falls short of a basic CNN trained using one-hot sequences and far short of supervised SOTA x.com/pkoo562/status…
Evaluating the representational power of pre-trained DNA language models for regulatory genomics 1/ 🧬 Pre-trained DNA language models (gLMs) offer potential for interpreting complex cis-regulatory patterns, but their utility in functional genomics remains debated. 2/ 🚀 gLMs…
(1/n) Can we disentangle the effects of multiple covariates (e.g., sex, age, disease) to predict multiple counterfactual outcomes at the single-cell level? We introduce CellDISECT, a causal generative model that addresses this by learning to generate synthetic counterfactuals…