Dawei Li
@Dawei_Li_ASU
CS PhD @ ASU https://david-li0406.github.io/ LLMs, NLP, Data Mining Founder of Oracle-LLM: https://oracle-llm.github.io/
🏆 Best Paper Award at DIG-BUG@ICML 2025! 📢📢Thrilled to share that our work "Preference Leakage: A Contamination Problem in LLM-as-a-Judge (arxiv.org/abs/2502.01534)" has received the Best Paper Award at the #ICML2025 Workshop on Data in Generative Models (DIG-BUG)! This is my…



Thanks for sharing! Giving our paper a read if you are also interested in evolving contamination problem in the era of AI oversight👇
🎉Big congrats to #SCAI doctoral student Dawei Li and advisor Regents Professor @liuhuan! Their work on preference leakage in LLM evaluation won Best Paper 🏆 at the ICML 2025 Data in Generative Models Workshop 🇨🇦 🤖🧠 A must-read on fairness in AI: 🔗 arxiv.org/abs/2502.01534
💡Chain-of-Thought (CoT) is a form of explainability, though not always faithful. Although the terms "explainability" and "interpretability" are sometimes used interchangeably and are self-explanatory for most people, a paper that aims to refute potential misuse of these…
Excited to share our paper: "Chain-of-Thought Is Not Explainability"! We unpack a critical misconception in AI: models explaining their Chain-of-Thought (CoT) steps aren't necessarily revealing their true reasoning. Spoiler: transparency of CoT can be an illusion. (1/9) 🧵
😵💫 Struggling with 𝐟𝐢𝐧𝐞-𝐭𝐮𝐧𝐢𝐧𝐠 𝐌𝐨𝐄? Meet 𝐃𝐞𝐧𝐬𝐞𝐌𝐢𝐱𝐞𝐫 — an MoE post-training method that offers more 𝐩𝐫𝐞𝐜𝐢𝐬𝐞 𝐫𝐨𝐮𝐭𝐞𝐫 𝐠𝐫𝐚𝐝𝐢𝐞𝐧𝐭, making MoE 𝐞𝐚𝐬𝐢𝐞𝐫 𝐭𝐨 𝐭𝐫𝐚𝐢𝐧 and 𝐛𝐞𝐭𝐭𝐞𝐫 𝐩𝐞𝐫𝐟𝐨𝐫𝐦𝐢𝐧𝐠! Blog: fengyao.notion.site/moe-posttraini……
Our preference leakage is now accepted by DIG-BUGS@ICML 2025!
📢Our new work "Preference Leakage: A Contamination Problem in LLM-as-a-judge" has been released on Arxiv! 🚀Arxiv: arxiv.org/abs/2502.01534 🚀Github: github.com/David-Li0406/P… 🚀Website: llm-as-a-judge.github.io 🚀Huggingface: huggingface.co/papers/2502.01… ⭐𝐓𝐋𝐃𝐑: In this work, we…
🚀Excited to introduce BOW: A novel RL framework that rethinks vanilla next-word prediction as reasoning path exploration! Across 10 benchmarks, we show BOW leads to better zero-shot capabilities and next-word reasoning. 📄Paper: arxiv.org/pdf/2506.13502 🧵Details below
New for May 2025! * RL on something silly makes Qwen reason well v1 * RL on something silly makes Qwen reason well v2 * RL on something silly makes Qwen reason well v3 ...
Summary in case you missed any LLM research from the past month: * RL on math datasets improves math ability v1 * RL on math datasets improves math ability v2 * RL on math datasets improves math ability v3 * RL on math datasets improves math ability v4 * RL on math datasets...
🚀 Heading to hashtag#NAACL2025 in Albuquerque next week! If you're working on data synthesis, LLM-as-a-judge, or anything related to LLMs and data mining, feel free to reach out — would love to connect! Also, check out our two papers on alignment data synthesis and causal…
Intuit AI is looking for a summer intern with strong experience in LLM decoding related research. Location: Mountain View Feel free to DM your resume to me.
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories We are releasing the first benchmark to evaluate how well automatic evaluators, such as LLM judges, can evaluate web agent trajectories. We find that rule-based evals underreport success rates, and…