Wei Shen 沈 伟
@shenwei356
Associate professor of Bioinformatics at Chongqing Medical University, China. Lab: https://mbio.info Personal: http://shenwei.me/ http://shenwei356.bsky.social
Excited to share this collaborative work with @ZaminIqbal ! We're thrilled about the potential impact LexicMap could have on advancing microbial genomics research! Imagine how cool it is to search for a gene (with >30k hits) against 2 million bacterial genomes within 1 minute!
Great pleasure to work with @shenwei356 on a new indexing and alignment scheme, called LexicMap: biorxiv.org/content/10.110… We have been working on uniformly reassembling, QC-ing and annotating all bacterial (+ now archaeal) data, & wanted to be able to do full alignment to it....
Yes, I just recommended it to my students.
Sandbox.bio is shaping up to becoming one of the absolute top resources for learning hands on #bioinformatics and #genomics today!
Slides from my talk (with Kamil Jaron) on an history of k-mers in bioinformatics: rayan.chikhi.name/pdf/2025-kmers…
This seems like an awesome course! jshun.csail.mit.edu/6506-s24/! If there were more hours in the day, I'd want to put something like this together at UMD.
⚡️LexicMap v0.7.0 fixed a minor bug in index building and improved the alignment accuracy! Please rebuild the existing index. Sorry for the inconvenience. 🥹 github.com/shenwei356/Lex…
A decade ago, we had thousands of bacterial genomes. Now, we have millions. How to scale computational methods? Our paper in @naturemethods answers this: use evolutionary history to guide compression and search. …From terabytes to tens of GBs… w/@Baym @ZaminIqbal et al. 🧵1/
Thrilled that our work on this problem with @KarelBrinda, @ZaminIqbal, and others is out in @naturemethods today! We used phylogenetic compression (described in the thread) to compress every microbe ever sequenced onto a flash drive so that it can be searched with a laptop!
So we asked: what sets the fundamental limit on computation on large genomic databases? Evolution! The irreducible entropy in genome collections is bounded by the most parsimonious path to introduce that variability. In other words, optimal compression should echo phylogeny. 4/
As for my project (Project 1 in the list), please help spread the word to students interested in doing a PhD in Computational Biology/Bioinformatics. Note that PhD students here are employed with a salary, that the position comes with benefits, and that there are no tuition fees
🚀 LexicMap v0.6.0 is released! ✅ More accurate alignments! 🎯 Higher sensitivity for short queries (>100bp)! 💡 Denser seeds, same index size! 🔬 Function: Efficient seq alignment in millions of prokaryotic genomes! 📖 Docs: bioinf.shenwei.me/LexicMap github.com/shenwei356/Lex…
Bacteriophage protein Dap2 inhibits bacterial type III secretion system and synergizes with Dap1 to evade anti-phage immunity biorxiv.org/content/10.110…
Just updated seqkit, csvtk, taxonkit, and rush. Click to see changes: - github.com/shenwei356/seq… - github.com/shenwei356/tax… - github.com/shenwei356/csv… - github.com/shenwei356/rus…
I'm glad to announce that the simd-minimizers library is out! @curious_coding and I have been optimizing the computation of minimizers down to the smallest detail. The result is an order of magnitude faster than existing methods ; processing an entire human genome takes only 4s!
SimdMinimizers: Computing random minimizers, fast biorxiv.org/cgi/content/sh… #biorxiv_bioinfo
I did a project on making an optimized implementation of the S+ tree. The result is 40x speedup over plain binary search! It builds on Algorithmica's post on S-trees and the famous paper "Array layouts for comparison based searching" by @pkhuong. 🧵 curiouscoding.nl/posts/static-s…