Lei Li
@lileics
Generative AI for language and science. MT, LLM, GenAI Safety, Drug Discovery
The show is on. Welcome to 2025 Generative AI for Biology workshop. 7 invited talks + a panel with 5 panelists + 14 spotlight talks + 121 poster presentations! Huge thanks to the workshop sponsors: Genesis Therapeutics, Genbio AI, and Tencent! genbio-workshop.github.io/2025/

We have an excellent lineup of distinguished speakers at the Gen AI for Bio workshop! Join us in the East Exhibition Hall A on July 18, starting at 8:45am. #GenBio2025 #ICML2025
Hope to see you all tomorrow at the GenAI & Bio workshop!! #ICML2025 Schedule: genbio-workshop.github.io/2025/
We are presenting PPDiff for protein complex design at #ICML2025 west exhibition hall B2 #W-119 at 11am-1:30pm today 7/17. Come visit. @ZhenqiaoSong Key idea: sequence structure co-design + hybrid diffusion Paper: arxiv.org/abs/2506.11420

DISCO paper website: avduarte333.github.io/projects/disco/
#ICML2025 Andre @avduarte3333 and I are presenting DISCO: a new method to discover copyrighted content from VLM’s training data (without accessing to it). Welcome to visit our poster at Vancouver Convention Center East Exhibition Hall A#900 at 3pm 7/16. arxiv.org/abs/2502.17358
#ICML2025 Andre @avduarte3333 and I are presenting DISCO: a new method to discover copyrighted content from VLM’s training data (without accessing to it). Welcome to visit our poster at Vancouver Convention Center East Exhibition Hall A#900 at 3pm 7/16. arxiv.org/abs/2502.17358
Just delivered 4 lectures (50mins each, a total of 3hours 20mins) in a roll at Advanced course on Data Science and Machine Learning (acdl2025.icas.events). Wonderful to have conversations with the ACDL participants! thanks to the directors, Giuseppe Nicosia and Panos Pardalos




We are organizing Generative AI for Biology workshop at #ICML2025. Welcome to submit any relevant work on AI for biomolecule, AI model for bio systems, AI and experiments, Agent for bio discovery, new datasets and tools, etc. The deadline is May 25th. genbio-workshop.github.io/2025/
⏰ Deadline extended to May 25th for GenAI and Biology workshop, considering multiple requests & NeurIPS deadline! 🚀Recent submissions to NeurIPS & other conferences/journals are welcome! 🧬For amazing speakers and more details: genbio-workshop.github.io/2025/
Better than LoRA! You only need to train as few as 18 token embeddings of LLaMA to achieve superior translation performance on new languages. KS-Lottery provides a statistical sound method to find an extremely small number of LLM embedding parameters to fine-tune!
I will give a talk at 11:15am today in Ruidoso at #NAACL2025 about KS-Lottery— finding small number of token embeddings in an LLM that are effective for fine-tuning. Surprising finding: 18 tokens are enough for fine-tuning!
How to reduce latency for simultaneous (text) translation? Siqi proposes TAF method — the key idea is to forecast source side continuations of utterance before actual input, and then using majority voting to generate possible translations. arxiv.org/abs/2410.22499 #NAACL2025
Excited to be at #NAACL2025 in Albuquerque! We have two papers on simultaneous translation 🎉 1️⃣ Anticipating Future with Large Language Model for Simultaneous Machine Translation 🗓 Apr 30, 11:45–12:00 @ Ruidoso (Oral) 🔗 arxiv.org/abs/2410.22499 2️⃣ CA*: Addressing Evaluation…
Simultaneous translation always aims to reduce latency while retaining translation quality, but measuring latency turns non-trivial. Xi and Siqi’s new work proposes a highly accurate method, CA*, to measure latency in ST, by taking actual inference time into account. #NAACL25



Can AI text detectors identify LLm generated code, paper reviews, abstract, translation, summary? Brian is presenting a new study about existing AI text detectors on LLM generated content at #NAACL2025 TLDR; all existing detectors work poorly. arxiv.org/abs/2412.05139



Kexun is presenting OSCA - Optimal Sample Compute Allocation at #NAACL2025 in Hall 3 (#50). The paper presents an optimization algorithm to find optimal configurations for LLM inference. arxiv.org/abs/2410.22480


I will give a talk at 11:15am today in Ruidoso at #NAACL2025 about KS-Lottery— finding small number of token embeddings in an LLM that are effective for fine-tuning. Surprising finding: 18 tokens are enough for fine-tuning!

Excited to visit ABQ! We are presenting six papers at #NAACL2025 on simultaneous translation/speech translation, inference-time optimization, finding lottery tickets in LLMs, AI text detection, and language agents for task planning. I am here the full week. Feel free to DM.




This work was partially done with @lileics and @yuxiangw_cs during our time at @UCSB Poster attached for a better overview! 🎯
The 2nd Generative AI and Biology workshop will collocate with ICML 2025 in Vancouver this year (July 18/19, 2025). CFP: genbio-workshop.github.io/2025/ We have a fantastic lineup of speakers. @MengdiWang10 @ericxing @marinkazitnik @StefanoErmon @MinkaiX @ZhenqiaoSong
Hi everyone, we are so back! Delighted to announce the 2nd Generative AI and Biology (GenBio) Workshop @icmlconf #icml2025! Join us in this exciting discourse on all aspects of the future of #GenerativeAI and Biology!! 🧬🚀 Website: genbio-workshop.github.io/2025/ 1/n
a newly baked Dr. Congratulations to @WendaXu2 for successfully defending his phd thesis "On Evaluation and Efficient Post-training for LLMs". Highly recommend his slides: covering RL training, better KD, LLM/text gen evaluation, bias in LLM as a judge: docs.google.com/presentation/d…
[Life update] 🎉 I successfully defended my PhD thesis "On Evaluation and Efficient Post-training for LLMs" @ucsbNLP and am officially a PhD! Huge thanks to my advisors @WilliamWangNLP @lileics, my committee @markuseful & Simon Todd, and everyone who supported me during my PhD…
Excited to announce our 2025 keynote speakers: @cosmo_shirley, Nicholas Carlini, @LukeZettlemoyer, and Tom Griffiths!
Congratulations Dr. Sun @EdwardSun0909 ! Zhiqing's phd thesis on Scalable alignment of LLM is a must-read if you work on LLM recently.
I successfully defended my PhD thesis today! 🎉 "Scalable Alignment of Large Language Models Towards Truth-Seeking, Complex Reasoning, and Human Values" Slides (Fact-RLHF, Lean-STaR, Easy-to-Hard Generalization, Self-Align, Instructable Reward Model): docs.google.com/presentation/d… A…
A new comprehensive multilingual (and multitask) evaluation suite for LLMs (covering 17 diverse languages), developed by @xuhuang87 and folks! Check out BenchMAX at github.com/CONE-MT/BenchM…
🤩Excited to announce our new work BenchMAX!🥳 BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models Paper: huggingface.co/papers/2502.07… Repo: github.com/CONE-MT/BenchM… Datasets: huggingface.co/collections/LL…