Lucy Li
@lucy3_li
Postdoc (soon) @uwnlp. Incoming assistant prof @WisconsinCS. Prev @UCBerkeley, @allen_ai, @MSFTResearch, @stanfordnlp. More at http://lucy3.bsky.social.
I'm sadly not at #IC2S2 😭, but I will be at #ACL2025 in Vienna ☕️ next week!! Please spread the word that I'm recruiting prospective PhD students: lucy3.notion.site/for-prospectiv…
Loved today’s Guest Opinion Essay published in @nytopinion , written by @meghanor after her thought experiment with ChatGPT, also mentioning our paper «Your Brain on ChatGPT».
Today (w/ @UniofOxford @Stanford @MIT @LSEnews) we’re sharing the results of the largest AI persuasion experiments to date: 76k participants, 19 LLMs, 707 political issues. We examine “levers” of AI persuasion: model scale, post-training, prompting, personalization, & more 🧵
Honored to get the outstanding position paper award at @icmlconf :) Come attend my talk and poster tomorrow on human centered considerations for a safer and better future of work I will be recruiting PhD students at @stonybrooku @sbucompsc coming fall. Please get in touch.
Very excited for a new #ICML2025 position paper accepted as oral w @mbodhisattwa & @TuhinChakr! 😎 What are the longitudinal harms of AI development? We use economic theories to highlight AI’s intertemporal impacts on livelihoods & its role in deepening labor-market inequality.
.@stanfordnlp papers at @aclmeeting in Vienna next week: • HumT DumT: Measuring and controlling human-like language in LLMs @chengmyra1 @sunnyyuych @jurafsky • Controllable and Reliable Knowledge-Intensive Task Agents with Declarative GenieWorksheets @harshitj__ @ShichengGLiu…
🎉 The @MilaNLProc lab is excited to present 15 papers and 1 tutorial at #ACL2025 & workshops! Grateful to all our amazing collaborators, see everyone in Vienna! 🚀
issues w preference LM benchmarks 🐡data contains cases where the "bad" response is just as good as chosen one 🐟model rankings can feel off (claude ranks lower than expected) led by @cmalaviya11 (TACL 2025), we study underspecified queries & detrimental effect on model evals
In our new paper, “Contextualized Evaluations: Judging Language Model Responses to Underspecified Queries,” we find that adding just a bit of missing context can reorder model leaderboards—and surface hidden biases. 🧵👇
🗣️ Excited to share our new #ACL2025 Findings paper: “Just Put a Human in the Loop? Investigating LLM-Assisted Annotation for Subjective Tasks” with @jad_kabbara and @dkroy. Arxiv: arxiv.org/abs/2507.15821 Read about our findings ⤵️
🚨 The deadline for submitting proposals to host ICWSM 2027 is extended! Send your proposals (and questions) to Ceren Budak by August 15
We are seeking emergency reviewers for the NeurIPS 2025 ethics review. If you are interested and available to contribute this week, please sign up at tiny.cc/neurips2025eth…. @NeurIPSConf @trustworthy_ml @acidflask #AI #ML #NeurIPS
📢 I'm looking for a postdoc to join my lab at NYU! Come work with me on a principled, theory-driven approach to studying language, learning, and reasoning, in humans and AI agents. Apply here: apply.interfolio.com/170656 And come chat with me at #CogSci2025 if interested!
Does vision training change how language is represented and used in meaningful ways?🤔 The answer is a nuanced yes! Comparing VLM-LM minimal pairs, we find that while the taxonomic organization of the lexicon is similar, VLMs are better at _deploying_ this knowledge. [1/9]
Join us at #NeurIPS2025 workshop to explore the future of multi-turn AI interactions! We welcome submissions on RL for agents, alignment, evaluation methods, and more.
🚀 Call for Papers — @NeurIPSConf 2025 Workshop Multi-Turn Interactions in LLMs 📅 December 6/7 · 📍 San Diego Convention Center Join us to shape the future of interactive AI. Topics include but are not limited to: 🧠 Multi-Turn RL for Agentic Tasks (e.g., web & GUI agents,…
I'm also kicking off year 2 of faculty life by doing all the challenging things people warned me not to do, starting by creating a new NLP course around ethics of large-scale language modeling. I welcome thoughts on this as well! I might document the process to help others.
📢 today's scaling laws often don't work for predicting downstream task performance. For some pretraining setups, smooth and predictable scaling is the exception, not the rule. a quick read about scaling law fails: 📜arxiv.org/abs/2507.00885 🧵1/5👇
Great science starts with great questions. 🤔✨ Meet AutoDS—an AI that doesn’t just hunt for answers, it decides which questions are worth asking. 🧵
🤔 Have you ever wondered how good ModernBERT is compared to decoders like Llama? We made an open-data version of ModernBERT and used the same recipe for encoders and decoders. Turns out, our encoder model beat ModernBERT and our decoder model beats Llama 3.2 / SmolLM2 🤯 🧵