fajie yuan
@duguyuan
Assistant Prof at Westlake University
We release our protein chatGPT, Evola! 🌟 chat-protein.com Evola comes in two versions: 10B & 80B. The 80B model has a 1.3B Saprot encoder & a 70B LLaMA3 decoder. Trained on 546 protein question-text pairs with an 150 billion word tokens! 💡🔬 biorxiv.org/content/10.110…
How can we effectively decode and understand the complex molecular language of proteins to unlock their functional secrets at scale?@biorxivpreprint @Westlake_Uni "Decoding the Molecular Language of Proteins with Evola" • Scientists have developed Evola, an 80 billion…
After DeepSeek R1, there's new Claude 4 level model from China that outperforms DeepSeek v3, Qwen and OpenAI GPT-4.1 Meet Kimi k2 - 1 trillion parameter model purpose-built for agentic workflows with native MCP integration. 100% Opensource and FREE to try. Let that sink in.
Put together a gif showing how NNs have taken over CASP 😀
This week in biotechnology: 1. There is a lot of skepticism about virtual cells. I don't think negative criticism is worthwhile, in part, because the first version of everything tends to be bad. Also, AlphaFold2 came out a full 26 years after the CASP competition first began!…
SSAlign: Ultrafast and Sensitive Protein Structure Search at Scale 1.The rapid growth of protein structure databases, fueled by AlphaFold3 and ESMFold, demands faster, more sensitive search tools. Existing methods like Foldseek struggle with sensitivity and scalability,…
We’re excited to introduce Chai-2, a major breakthrough in molecular design. Chai-2 enables zero-shot antibody discovery in a 24-well plate, exceeding previous SOTA by >100x. Thread👇
Always a pleasure to interact and work with @anthonygitter !! He is up to some fun things at Morgridge :)
.@anthonygitter designs computational methods to study diseases and develop new drugs and proteins. He also develops machine learning models to speed up the process of drug discovery. Learn more about his work in Faces of Data Science: datascience.wisc.edu/#faces @Morgridge_Inst
⏰ We introduce Reinforcement Pre-Training (RPT🍒) — reframing next-token prediction as a reasoning task using RLVR ✅ General-purpose reasoning 📑 Scalable RL on web corpus 📈 Stronger pre-training + RLVR results 🚀 Allow allocate more compute on specific tokens
deeply honored.
It feels like they're announcing a festival along with the headliners. Here is a personal list of researchers in AI-based protein science whose work I really admire, they’re incredibly creative! And a longer list of other colleagues working in the field x.com/miangoar/statu…
1/2 This meme is gold 😂 But I don't know, maybe I'm in the middle of the distribution. I think AF3 was designed to mimic the data rather than truly understand it. Because of that, there are issues with hallucination and memorization as indicated here x.com/miangoar/statu…
1/2 Ha, so far structure pred looks like: AlphaFold2 we added phys/chem/evo traits to our model AF3 we removed almost all bio-inspired traits to create a more generalizable model Post-AF3 we've problems of hallucination/memorization, future models requiere bio-inspired traits
cool
VENUSX: Unlocking Fine-Grained Functional Understanding of Proteins 1.VENUSX is the first large-scale benchmark specifically designed to evaluate protein models at fine-grained functional levels—residue, fragment, and domain—addressing the critical gap left by coarse…
Excited to see our Cell2Sentence collaboration with @GoogleAI @GoogleDeepMind featured in Nature News! Check it out here: nature.com/articles/d4158… 🧬
Excited to share what my team has been working on lately - Gemini diffusion! We bring diffusion to language modeling, yielding more power and blazing speeds! 🚀🚀🚀 Gemini diffusion is especially strong at coding. In this example the model generates at 2000 tokens/sec,…
We’ve developed Gemini Diffusion: our state-of-the-art text diffusion model. Instead of predicting text directly, it learns to generate outputs by refining noise, step-by-step. This helps it excel at coding and math, where it can iterate over solutions quickly. #GoogleIO
As a new paradigm, Prot2Text Model is becoming popular.
Prot2Text-V2: Protein Function Prediction with Multimodal Contrastive Alignment 1.Prot2Text-V2 introduces a powerful framework that generates free-form natural language descriptions of protein function directly from amino acid sequences, moving beyond structured labels like GO…
Our new preprint, "SoftAlign: End-to-end protein structure alignment," is now on bioRxiv! biorxiv.org/content/10.110…