Davide D'Ascenzo
@davide_dascenzo
Italian National PhD Student in AI Università degli Studi di Milano | Politecnico di Torino
🚀Training deep learning models on massive single-cell datasets is now fast & easy!🧬 scDataset enables fast random sampling from disk—no memory blowup, no format conversion. On Tahoe 100M, up to 48× faster than AnnLoader! 🔗 github.com/Kidara/scDatas… 📄 arxiv.org/abs/2506.01883

We love it when others build dev tools for building on Tahoe-100M. Today, we highlight scDataset in our @tahoe_ai blog. Developed by @davide_dascenzo & @sebacultrera, it makes it easier to train DL models on datasets with the daunting scale of Tahoe-100M (see poster in ICML '25)
Excited to share our latest preprint, introducing the hierarchical cross-entropy (HCE) loss — a simple change that consistently improves performance in atlas-scale cell type annotation models. doi.org/10.1101/2025.0…