Sebastian Deorowicz

@sdeorowicz

Data compression. Algorithms for genome sequencing compresion and analysis.

Gliwice, Poland

Joined August 2019

31Following

377Followers

Interested in a tool that aligns millions of proteins in minutes with quality similar to or better than the state-of-the-art utilities? Please take a look at our FAMSA2 paper: biorxiv.org/content/10.110… and GH repo: github.com/refresh-bio/FA…

sdeorowicz's tweet card. Algorithm for ultra-scale multiple sequence alignments (3M protein sequences in 5 minutes and 24 GB of RAM) - refresh-bio/FAMSA

6.0K

Sebastian Deorowicz Retweeted

Nature Methods@naturemethods · May 15

Vclust generates fast and accurate estimation of average nucleotide identity (ANI) for viral genomes, scaling clustering to millions of genomes. @a_zielezinski @AdamGudys @sdeorowicz @Piotr_Rozwalak @UAM_Poznan @polsl_pl @UniJena nature.com/articles/s4159…

4.0K

Sebastian Deorowicz@sdeorowicz · May 15

Vclust (the ultra-fast, high-accuracy tool for viral genome comparison & clustering) is now published: nature.com/articles/s4159… Great collaboration with @a_zielezinski, @AdamGudys, UAM guys, and Bas E.Dutilh

sdeorowicz's tweet card. Nature Methods - Vclust generates fast and accurate estimation of average nucleotide identity for viral genomes, scaling clustering to millions of genomes.

2.0K

Sebastian Deorowicz@sdeorowicz · Dec 26

Recently, our SPLASH paper (nature.com/articles/s4158…) was published in NatBiotech. Now, we release its extended version, sc-SPLASH (biorxiv.org/content/10.110…), which allows reference-free analysis of single-cell data. It was a great experience to work with our collaborators on that!

1.0K

Sebastian Deorowicz@sdeorowicz · Nov 27

The latest hifiasm can directly assemble standard @nanopore simplex R10 reads, without HERRO correction or other preprocessing, to phased contigs of contiguity comparable to HiFi assembly. Like before, you can further add ultra-long, Hi-C or trio data for better assembly.

MMike Vella@vellamike · Nov 27

Exciting news! The latest hifiasm release from @ChengChhy and @lh3lh3 adds beta support for @nanopore simplex R10 reads. Initial results look very promising. 🚀 Check it out: github.com/chhylp123/hifi…"

184

22.0K

Sebastian Deorowicz@sdeorowicz · Nov 25

AGC 3.2 (assembled genome compressor) has been released. Better speed, better ratio (at least for bacteria genomes), optional low-memory decompression. github.com/refresh-bio/agc

sdeorowicz's tweet card. Assembled Genomes Compressor. Contribute to refresh-bio/agc development by creating an account on GitHub.

4.0K

Sebastian Deorowicz Retweeted

Roozbeh Dehghannasiri@roozbehdn · Oct 9

Happy to share our latest paper with @marekkoki on SPLASH2 for ultra-efficient reference-free discovery directly on raw sequencing reads out in @NatureBiotech, supervised by @SalzmanLab and @sdeorowicz, and with great contributions from @TBaharav. nature.com/articles/s4158…

6.0K

Sebastian Deorowicz@sdeorowicz · Sep 23

New paper online in @NatureBiotech by @sdeorowicz group and @SalzmanLab: SPLASH2 speeds up analysis of sequence variation in massive datasets.

NNature Biotechnology@NatureBiotech · Sep 23

Scalable and unsupervised discovery from raw sequencing reads using SPLASH2 go.nature.com/3N1SGBL

500

Sebastian Deorowicz Retweeted

Heng Li@lh3lh3 · Sep 4

Preprint on "BWT construction and search at the terabase scale". We can compress 100 human genomes to 11GB in 21 hours, find SMEMs with it, do affine-gap alignment and retrieve similar local haplotypes. 7.3Tb commonly sequenced bacterial genomes ⇒ 30GB arxiv.org/abs/2409.00613

224

730

250

191.0K

Sebastian Deorowicz@sdeorowicz · Jul 24, 2024

Pangene now published in Bioinformatics: doi.org/10.1093/bioinf…. In addition to showcasing applications (see the 17q21.31 inversion below), we also reviewed the theoretical formulation of bidirected graphs and discussed the definition and the finding of "bubbles" in such graphs.

HHeng Li@lh3lh3 · Feb 28, 2024

Preprint on Exploring gene content with pangenome gene graphs: arxiv.org/abs/2402.16185. It describes pangene for building gene graphs and for calling gene-level variations which can be found at pangene.bioinweb.org. Pleasant collaboration with @maxgmarin and @MahaFarhat.

105

300

38.0K

Sebastian Deorowicz@sdeorowicz · Jul 10, 2024

I am happy to announce that ProteStAr, our compressor of CIF/PDB files with 3D atom coordinates, is now published at Bioinformatics. With this, you can store the whole ESM Atlas or AlphaFold DB in a few files (rather than 200M+) with fast random access. doi.org/10.1093/bioinf…

sdeorowicz's tweet card. AbstractMotivation. The introduction of Deep Minds’ Alpha Fold 2 enabled the prediction of protein structures at an unprecedented scale. AlphaFold Protein

3.0K

Sebastian Deorowicz Retweeted

Andrzej Zielezinski@a_zielezinski · Jul 9, 2024

When writing bioinformatics tools, I often need unique IDs for things like temp directories. So, I created a Python package for generating fun & memorable IDs like "retired-nucleotide" or "funny-malware-7ab4" covering everything from sports to science. github.com/aziele/unique-…

559

Sebastian Deorowicz Retweeted

Andrzej Zielezinski@a_zielezinski · Jul 3, 2024

Excited to share Vclust! It's a fast and accurate tool for calculating intergenomic similarities (like ANI) and clustering virus/#phage genomes/contigs according to ICTV and MIUViG standards. 💻 Tool: github.com/refresh-bio/vc… 📄 Preprint: biorxiv.org/content/10.110… Thread! 1/6 ↓

104

14.0K

Sebastian Deorowicz@sdeorowicz · Jul 2, 2024

Clustering large datasets can be challenging. Fortunately, even slow methods can sprint for sparse similarity matrices. Clusty offers s-, c-link, uclust, set-cover, cd-hit, leiden. The paper shows an application for 15M+ sequences. github.com/refresh-bio/cl… biorxiv.org/content/10.110…

2.0K

Sebastian Deorowicz@sdeorowicz · Jul 2, 2024

After a few years of development, Kmer-db v.2, our tool for finding similar sequences in large collections of genomic data (even millions of viral genomes), is ready. If interested, take a look at the GitHub repo and related paper. github.com/refresh-bio/km… biorxiv.org/content/10.110…

7.0K

Sebastian Deorowicz@sdeorowicz · Mar 18, 2024

For the current (and future) users: AGC 3.1 (Assembled Genome Compressor) is ready for download: github.com/refresh-bio/agc Main updates: support for ARM-based CPUs, e.g., Mac M1/M2/...; some bug fixes; some new features; speed optimizations. Bioconda package should be ready soon.

6.0K

Sebastian Deorowicz Retweeted

Zamin Iqbal@ZaminIqbal · Mar 11, 2024

First step in a community project to provide a uniformly assembled, annotated and searchable set of bacterial genomes, our preprint on our initial release of 1.9 million genome assemblies+taxonomic estimates. (figure compares with previous 661k dataset) biorxiv.org/content/10.110…

159

361

92.0K

Sebastian Deorowicz@sdeorowicz · Jan 29, 2024

Exciting news! 🎉 Our research on ancient phages in the human gut by @Piotr_Rozwalak is now out in @NatureComms! 📚🔬 A big shoutout to @BEDutilh and @RajithaYasas1 for an amazing collaboration.

PPiotr@Piotr_Rozwalak · Jan 29, 2024

Unveiling the ancient history of bacteriophages!🧬🔬 We've discovered a nearly identical phage genome from 1300 years ago, providing insights into phage-bacteria interactions spanning millennia. 🤯 🌐@a_zielezinski, @BEDutilh @RajithaYasas1 nature.com/articles/s4146…

11.0K

Sebastian Deorowicz@sdeorowicz · Jan 26, 2024

We've just published a new release of RECKONER, our tool for Illumina read correction. The paper also evaluates the impact of read correction in variant calling pipelines. nature.com/articles/s4159…

1.0K

Sebastian Deorowicz Retweeted

bioRxiv Bioinfo@biorxiv_bioinfo · Jan 22, 2024

Ultra-efficient, unified discovery from microbial sequencing with SPLASH and precise statistical assembly biorxiv.org/cgi/content/sh… #biorxiv_bioinfo

1.0K