Erik Garrison
@erikgarrison
(pan)genomes from many points of view. Assistant Professor UTHSC Memphis
For almost a year I've been using whisper speech to text instead of typing github.com/ekg/loq
Postdoc position opening in my group! Research projects: pangenomes for diverse organisms, genome evolution, biocomputing, language models. Please reach out if interested!
The plan at FutureHouse has been to build scientific agents and use them to make novel discoveries. We’ve spent the last year researching the best way to make agents. We’ve made a ton of progress and now we’ve engineered them to be used at scale, by anyone. Today, we’re launching…
We suspect OpenAI's books2 dataset might be "all of libgen", but no one knows. It's all pure conjecture. Nonetheless, books3, released above, is "all of bibliotik", which I imagine will be of interest to anyone doing NLP work. Or anyone who wants to read 196,640 books. :)
In a new preprint, @curious_coding and I led the derivation of a new lower bound, g’ (red), on k-mer sampling scheme density: - our bound appears to be tight for k ≡ 1 (mod w). - existing schemes are nearly optimal! - mod-minimizers are optimal for large sigma for k ≡ 1 (mod w)
A near-tight lower bound on the density of forward sampling schemes biorxiv.org/cgi/content/sh… #biorxiv_bioinfo
Memory makes computation universal, remember? thinks.lol/2025/01/what-c…

cool project where researchers applied interpretability techniques that have worked on language models to a model that uses protein sequences to predict protein structures. And indeed, the models are discovering real biological concepts that humans have also found.
A preview of some structural features: An alpha helix forms a full rotation every 3.6 residues. We find a feature which activates strongly on every 7th amino acid (approximately two full turns) in alpha helices.
making a lot of pairwise alignments (PAF) and needed a way to confirm validity of both coordinates and alignment github.com/ekg/pafcheck
Our paper on the amylase locus is out in Nature (doi.org/10.1038/s41586…)! Working with this team was incredibly fun! We developed a computational framework (cosigt) to infer the haplotype composition of short read samples leveraging the pangenome graph. 1/n
One project develops computational approaches to infer human variation from graph genomes and its role in driving human phenotypic variation @davidebolo93 @raveancic @erikgarrison @psudmant. See also biorxiv.org/content/10.110…. Apply here: careers.humantechnopole.it/o/postdoc-in-p…
Three/four exciting 4-year computational post-doctoral positions in our team @humantechnopole to work on a variety of projects. Please retweet and share within your networks.
I have spent the last few months writing a simple transformer program that can be used easily on pangenome data. 1/n GitHub - mol-evol/panGPT: A Transformer for Pangenome data github.com/mol-evol/panGPT
🧬 Do you think pangenomics is hot? Attend our Workshop, Conference & Biohackathon (May 18-22, 2024). Discover genomic diversity, engage with leading scientists, and contribute to cutting-edge software. Register now! pangenome.github.io/MemPanG24/ #Pangenomics #Bioinformatics #MemPanG24
Finally, one of the cool insights that came from this work is that dark matter in biology (missing heritability) is arguably (via the equivalence of the Price equation and the virial theorem) an actual analog of dark matter in astronomy! 15/15 technologyreview.com/2010/12/21/197…
📢Introducing In-Context Vectors (ICV), a more effective + controllable alternative to #LLM in-context learning arxiv.org/abs/2311.06668 Instead of prompting, we learn LLM latent vector that captures user examples. Then we can steer text generation w/ ICV to get better results…
🧬💻👾 celebrating a year of great achievements with old (@otrebor87 ), current (@BuonaiutoSilvia @flavia_villani @GDamaggio @AndresGuarahino), remote (Maddie, Franco), favorite (@erikgarrison) collaborators🍾🥇🎊 looking forward for next year's projects @CnrIgb and @uthsc 🦸🦇❤️