Xinyu Yuan
@XinyuYuan402
Transfer learning and generalization problems for representation learning, including different data modalities like knowledge graphs, protein sequences, etc.
A few friends and I started the Multiomics Reading Group a while back, but we never got around to announcing it. So we're doing that now! Alongside my co-host César Miguel Valdez Córdova and organizer @DylanMannK we have (mostly) weekly presentations from 11 AM-12 PM EST.
I fully agree, there are a lot of folks that reach out from CS with superb papers and want todo AIxBiology yet are not keen to understand biology and some consider data cleaning “donkey work”. The best methods out there are clearly designed based on inductive biases and elements…
For biological data, if you don't have deep expertise in this low value work called data cleaning, u r lacking a fundamental understanding of the idiosyncrasies of the data. Without this knowledge, it is impossible to seriously model data.
New OpenProblems paper out! 📝 Led by Malte Lücken with Smita Krishnaswamy, we present openproblems.bio – a community-driven platform benchmarking single-cell analysis methods. Excited about transparent, evolving best practices for the field! 🔗 nature.com/articles/s4158…
⏰20th Multiomics RG Talk by @AdaFang_ on this Wednesday 11am-12pm EST, Jun 18th Affiliation: Harvard University Talk: ATOMICA: Learning Universal Representations of Intermolecular Interactions Paper Link: biorxiv.org/content/10.110… Meeting Link: meet.google.com/bvv-fcdn-rxm
🙌 Our code is now open-sourced! All data preprocessing / pre-training / downstream scripts, processed data, and pretrained model weights are released github.com/KatarinaYuan/S…
🚀 Introducing StructTokenBench, the first comprehensive benchmark for protein structure tokenization (PST), and our new method, AminoAseed, outperforming ESM3's PST across all benchmarking perspective 📄Paper: arxiv.org/pdf/2503.00089 🔹Open-source in one month! Stay tuned!
Check our latest work at #ICLR2025 on using discrete diffusion language models for understanding protein dynamics!
ICLR'25 | Structure Language Models for Protein Conformation Generation Keywords: language modeling | discrete diffusion | PLM fine-tuning 🔗arXiv: arxiv.org/abs/2410.18403 🐱🐙Github: github.com/lujiarui/esmdi… 📍Fri | 25 Apr. 10am | Hall 3 + Hall 2B Happy to chat if you're in 🇸🇬!!
🔥(1/5) Introducing GeoFlow V2: A unified atomic diffusion model for protein design -Unifies structure prediction & de novo design w/ versatile constraint support -SOTA results in Ab:Ag folding & epitope-specific Ab design Try it and read our report at prot.design!
Protein Structure Tokenization: Benchmarking and New Recipe 1/ Recent advances in protein structure tokenization (PST) methods enable direct application of language modeling techniques to protein 3D structures. However, the capabilities and limitations of these methods remain…