Dawei Li

@Dawei_Li_ASU

CS PhD @ ASU https://david-li0406.github.io/ LLMs, NLP, Data Mining Founder of Oracle-LLM: https://oracle-llm.github.io/

Joined November 2024

304Following

223Followers

Pinned

Dawei Li@Dawei_Li_ASU · Jul 22

🏆 Best Paper Award at DIG-BUG@ICML 2025! 📢📢Thrilled to share that our work "Preference Leakage: A Contamination Problem in LLM-as-a-Judge (arxiv.org/abs/2502.01534)" has received the Best Paper Award at the #ICML2025 Workshop on Data in Generative Models (DIG-BUG)! This is my…

Dawei_Li_ASU's tweet image. 🏆 Best Paper Award at DIG-BUG@ICML 2025!

📢📢Thrilled to share that our work "Preference Leakage: A Contamination Problem in LLM-as-a-Judge (arxiv.org/abs/2502.01534)" has received the Best Paper Award at the #ICML2025 Workshop on Data in Generative Models (DIG-BUG)!

This is my…

745

Dawei Li@Dawei_Li_ASU · Jul 24

Thanks for sharing! Giving our paper a read if you are also interested in evolving contamination problem in the era of AI oversight👇

AASU School of Computing and Augmented Intelligence@SCAI_ASU · Jul 24

🎉Big congrats to #SCAI doctoral student Dawei Li and advisor Regents Professor @liuhuan! Their work on preference leakage in LLM evaluation won Best Paper 🏆 at the ICML 2025 Data in Generative Models Workshop 🇨🇦 🤖🧠 A must-read on fairness in AI: 🔗 arxiv.org/abs/2502.01534

387

Dawei Li@Dawei_Li_ASU · Jul 3

💡Chain-of-Thought (CoT) is a form of explainability, though not always faithful. Although the terms "explainability" and "interpretability" are sometimes used interchangeably and are self-explanatory for most people, a paper that aims to refute potential misuse of these…

FFazl Barez @ICML2025@FazlBarez · Jul 1

Excited to share our paper: "Chain-of-Thought Is Not Explainability"! We unpack a critical misconception in AI: models explaining their Chain-of-Thought (CoT) steps aren't necessarily revealing their true reasoning. Spoiler: transparency of CoT can be an illusion. (1/9) 🧵

211

Dawei Li Retweeted

Feng Yao@fengyao1909 · Jul 1

😵‍💫 Struggling with 𝐟𝐢𝐧𝐞-𝐭𝐮𝐧𝐢𝐧𝐠 𝐌𝐨𝐄? Meet 𝐃𝐞𝐧𝐬𝐞𝐌𝐢𝐱𝐞𝐫 — an MoE post-training method that offers more 𝐩𝐫𝐞𝐜𝐢𝐬𝐞 𝐫𝐨𝐮𝐭𝐞𝐫 𝐠𝐫𝐚𝐝𝐢𝐞𝐧𝐭, making MoE 𝐞𝐚𝐬𝐢𝐞𝐫 𝐭𝐨 𝐭𝐫𝐚𝐢𝐧 and 𝐛𝐞𝐭𝐭𝐞𝐫 𝐩𝐞𝐫𝐟𝐨𝐫𝐦𝐢𝐧𝐠! Blog: fengyao.notion.site/moe-posttraini……

226

160

37.0K

Dawei Li@Dawei_Li_ASU · Jun 30

Our preference leakage is now accepted by DIG-BUGS@ICML 2025!

DDawei Li@Dawei_Li_ASU · Feb 4

📢Our new work "Preference Leakage: A Contamination Problem in LLM-as-a-judge" has been released on Arxiv! 🚀Arxiv: arxiv.org/abs/2502.01534 🚀Github: github.com/David-Li0406/P… 🚀Website: llm-as-a-judge.github.io 🚀Huggingface: huggingface.co/papers/2502.01… ⭐𝐓𝐋𝐃𝐑: In this work, we…

537

Dawei Li Retweeted

Zhikun Xu@JerrryKun · Jun 17

🚀Excited to introduce BOW: A novel RL framework that rethinks vanilla next-word prediction as reasoning path exploration! Across 10 benchmarks, we show BOW leads to better zero-shot capabilities and next-word reasoning. 📄Paper: arxiv.org/pdf/2506.13502 🧵Details below

2.0K

Dawei Li@Dawei_Li_ASU · May 29

New for May 2025! * RL on something silly makes Qwen reason well v1 * RL on something silly makes Qwen reason well v2 * RL on something silly makes Qwen reason well v3 ...

GGraham Neubig@gneubig · Feb 11

Summary in case you missed any LLM research from the past month: * RL on math datasets improves math ability v1 * RL on math datasets improves math ability v2 * RL on math datasets improves math ability v3 * RL on math datasets improves math ability v4 * RL on math datasets...

340

39.0K

Dawei Li@Dawei_Li_ASU · Apr 27

🚀 Heading to hashtag#NAACL2025 in Albuquerque next week! If you're working on data synthesis, LLM-as-a-judge, or anything related to LLMs and data mining, feel free to reach out — would love to connect! Also, check out our two papers on alignment data synthesis and causal…

226

Dawei Li Retweeted

Ruocheng Guo@rguo_asu · Apr 22

Intuit AI is looking for a summer intern with strong experience in LLM decoding related research. Location: Mountain View Feel free to DM your resume to me.

962

Dawei Li Retweeted

Xing Han Lu@xhluca · Apr 15

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories We are releasing the first benchmark to evaluate how well automatic evaluators, such as LLM judges, can evaluate web agent trajectories. We find that rule-based evals underreport success rates, and…

227

148

33.0K