Rayan Chikhi

@RayanChikhi

Researcher in bioinformatics @institutpasteur and @CNRS. Tweets about methods for DNA sequencing data analysis, and genome assembly.

Joined September 2011

485Following

3KFollowers

Pinned

Rayan Chikhi@RayanChikhi · Jul 31

Today we’re excited to freely share an early-version of, perhaps, the world’s most expansive genetics dataset: Logan. #bioinformatics #petabase #genetics #genomics #openscience biorxiv.org/content/10.110…

136

320

111

57.0K

Rayan Chikhi Retweeted

Kevin K. Yang 楊凱筌@KevinKaichuang · 23 h

In 1965, Margaret Dayhoff published the Atlas of Protein Sequence and Structure, which collated the 65 proteins whose amino acid sequences were then known. Inspired by that Atlas, today we are releasing the Dayhoff Atlas of protein sequence data and protein language models.

197

17.0K

Rayan Chikhi Retweeted

Bernardo Rodriguez Martin@BerniRdgz · Jul 23

*New Open-Access Long Read Resource*. We sequenced 1,019 genomes from the 1000 Genomes Project sample cohort using @nanopore. Sequencing data is available at bit.ly/4m8dlE2. @embl @HHU_de @IMPvienna @CRGenomica nature.com/articles/s4158… [1/8]

186

13.0K

Rayan Chikhi Retweeted

Sebastian Deorowicz@sdeorowicz · Jul 19

Interested in a tool that aligns millions of proteins in minutes with quality similar to or better than the state-of-the-art utilities? Please take a look at our FAMSA2 paper: biorxiv.org/content/10.110… and GH repo: github.com/refresh-bio/FA…

6.0K

Rayan Chikhi Retweeted

Heng Li@lh3lh3 · Jul 8

Preprint on "Finding easy regions for short-read variant calling from pangenome data": arxiv.org/abs/2507.03718

141

10.0K

Rayan Chikhi@RayanChikhi · Jun 25

🧵1/n Estimating mutation rates using k-mers is fast—but what happens when repeats dominate the genome? In a new preprint, @HaonanWu_1998, Antonio Blanca, and myself propose a *repeat-aware* estimator that's accurate even in centromeres.

bbioRxiv Bioinfo@biorxiv_bioinfo · Jun 25

A k-mer-based estimator of the substitution rate between repetitive sequences biorxiv.org/content/10.110… #biorxiv_bioinfo

2.0K

Rayan Chikhi Retweeted

Tatta Bio@tatta_bio · Jun 20

We are thrilled to announce our new publication in Science Advances: Gaia, an AI-powered protein search platform that brings genomic context into functional annotation. science.org/doi/10.1126/sc… Gaia enables rapid, scalable discovery of remote homologs across 131,000+ genomes —…

2.0K

Rayan Chikhi Retweeted

PM @[email protected] @pashadag.bsky.social@pashadag · Jun 12

1/4 Hash functions in genomic sequence analysis (tinyurl.com/4kk9ccmt) : a new survey written together with @shaomingfu, @kanatos92, @xianglipsu, and Qian Shi. Before submitting it, we are posting it online to get feedback from the community.

3.0K

Rayan Chikhi Retweeted

Giulio Ermanno Pibiri@giulio_pibiri · Jun 10

A monumental collaborative effort with many incredible people ☺️ Proud to be part of this! arxiv.org/abs/2506.06536

2.0K

Rayan Chikhi@RayanChikhi · Jun 3

Slides from my talk (with Kamil Jaron) on an history of k-mers in bioinformatics: rayan.chikhi.name/pdf/2025-kmers…

5.0K

Rayan Chikhi Retweeted

Jim Shaw@jim_elevator · May 28

Announcing myloasm, a new long-read (ONT R10/PacBio) metagenome assembler. With @lh3lh3. myloasm-docs.github.io

14.0K

Rayan Chikhi Retweeted

Josipa Lipovac@JosipaLipovac · May 16

I am happy to share our new preprint introducing MADRe - a pipeline for Metagenomic Assembly-Driven Database Reduction, enabling accurate and computationally efficient strain-level metagenomic classification. @msikic, @r_vicedomini, @KrizanovicK 🔗biorxiv.org/content/10.110… 1/9

2.0K

Rayan Chikhi Retweeted

Sebastian Deorowicz@sdeorowicz · May 15

Vclust (the ultra-fast, high-accuracy tool for viral genome comparison & clustering) is now published: nature.com/articles/s4159… Great collaboration with @a_zielezinski, @AdamGudys, UAM guys, and Bas E.Dutilh

2.0K

Rayan Chikhi Retweeted

Human Pangenome Reference Consortium@HumanPangenome · May 12

📢 HPRC Release 2 is here! Now with phased genomes from 200+ individuals, a 5x increase from Release 1. Explore sequencing data, assemblies, annotations & alignments in our interactive data explorer ⬇️: humanpangenome.org/hprc-data-rele…

3.0K

Rayan Chikhi Retweeted

Lovro Vrček@lovrovrcek · May 3

GNNome was published in @genomeresearch! This is a novel paradigm for de novo genome assembly based on GNNs. Without explicitly implementing any simplification strategies, it can achieve results comparable or higher than other SOTA tools. Paper, code, and overview are 👇 [1/8]

132

24.0K

Rayan Chikhi Retweeted

Noam Teyssier@noamteyssier · Apr 29

Extracting @NCBI SRA files with fasterq-dump can require 17x the size of the accession while decompressing. Our new tool xsra extracts sequences at 5x throughput with significantly less disk usage, built-in compression, and optional BINSEQ outputs github.com/arcInstitute/x…

314

183

43.0K

Rayan Chikhi Retweeted

bioRxiv Bioinfo@biorxiv_bioinfo · Apr 25

High-quality metagenome assembly from nanopore reads with nanoMDBG biorxiv.org/content/10.110… #biorxiv_bioinfo

2.0K

Rayan Chikhi@RayanChikhi · Apr 18

New preprint on hifiasm (ONT)! We can now achieve near T2T human genome assembly using only ONT Simplex reads—in just half a day, with or without ultra-long sequencing. biorxiv.org/content/10.110…

MMike Vella@vellamike · Apr 18

Telomere-to-telomere de novo assembly from standard ONT reads (LSK114, Simplex). A really exciting advance—makes high-quality assembly practical for population-scale sequencing! Preprint from @ChengChhy, @lh3lh3 and colleagues biorxiv.org/content/10.110…

125

16.0K

Rayan Chikhi Retweeted

Karel Břinda@KarelBrinda · Apr 10

A decade ago, we had thousands of bacterial genomes. Now, we have millions. How to scale computational methods? Our paper in @naturemethods answers this: use evolutionary history to guide compression and search. …From terabytes to tens of GBs… w/@Baym @ZaminIqbal et al. 🧵1/

165

11.0K

Rayan Chikhi Retweeted

Heng Li@lh3lh3 · Mar 24

longcallD is a new variant caller for genomic long reads. It jointly calls phased small and structural variants. Single binary, one command line for the whole process. Comparable accuracy to mainstream callers. Great work by Yan Gao. github.com/yangao07/longc…

256

17.0K

Rayan Chikhi Retweeted

Krithik Ramesh@KrithikTweets · Mar 21

🧬 Meet Lyra, a new paradigm for accessible, powerful modeling of biological sequences. Lyra is a lightweight SSM achieving SOTA performance across DNA, RNA, and protein tasks—yet up to 120,000x smaller than foundation models (ESM, Evo). Bonus: you can train it on your Mac. read…

148

735

526

110.0K