Jessy Li
@jessyjli
Associate Professor @UT_Linguistics, computational linguistics and #NLProc
Excited to share that QUDsim has been accepted to #COLM2025!! 🎉🎉
Have that eerie feeling of déjà vu when reading model-generated text 👀, but can’t pinpoint the specific words or phrases 👀? ✨We introduce QUDsim, to quantify discourse similarities beyond lexical, syntactic, and content overlap.
Check out this new opinion piece from Sebastian and Lily! We have really powerful AI systems now, so what’s the bottleneck preventing the wider adoption of fact checking systems, in high stakes scenarios like medicine? It’s how we define the tasks 👇
Are we fact-checking medical claims the right way? 🩺🤔 Probably not. In our study, even experts struggled to verify Reddit health claims using end-to-end systems. We show why—and argue fact-checking should be a dialogue, with patients in the loop arxiv.org/abs/2506.20876 🧵1/
I am excited to present our study on information salience in LLMs today at #ACL2025NLP (x4/x5, Tue, 16:00--17:30). Please come by if you are interested! 📝 Behavioral Analysis of Information Salience in Large Language Models With @jschloetterer @jessyjli @SeifertChristin
Do you want to know what information LLMs prioritize in text synthesis tasks? Here's a short 🧵 about our new paper: an interpretable framework for salience analysis in LLMs. First of all, information salience is a fuzzy concept. So how can we even measure it?
Looking forward to attending #cogsci2025! I’m especially excited to meet students who will be applying to PhD programs in Computational Ling/CogSci in the coming cycle. Please reach out if you want to meet up and chat! Email is best, but DM also works if you must quick🧵:
Tuesday at #ACL2025: @jantrienes will be presenting this from 4-5:30pm in x4/x5! Turns out content selection in LLMs are highly consistent with each other, but not so much with their own notion of importance or with human’s…
Do you want to know what information LLMs prioritize in text synthesis tasks? Here's a short 🧵 about our new paper: an interpretable framework for salience analysis in LLMs. First of all, information salience is a fuzzy concept. So how can we even measure it?
🇦🇹 I’m on my way to #ACL2025 to help present two papers (🧵s below) ➡️ MAT-Steer (07/30 at 11am), our method for steering LLMs w/ multiple attributes (e.g. truthfulness, bias reduction, and toxicity mitigation) simultaneously. ➡️ LAQuer (07/28 at 11am), a new task/framework for…
Extremely excited to announce that I will be joining @UTAustin @UTCompSci in August 2025 as an Assistant Professor! 🎉 I’m looking forward to continuing to develop AI agents that interact/communicate with people, each other, and the multimodal world. I’ll be recruiting PhD…
Heading to Vienna tomorrow for #ACL2025! Monday: Will Sheffield will be presenting his work on just the sneaky little discourse particle called JUST 😁 Just stop by Hall X4/X5 6-7:30pm CEST! Paper aclanthology.org/2025.findings-… w/ @kanishkamisra @valentina__py Ashwini Deo @kmahowald

New study on LMs and discourse sensitivity! We evaluate 25 LMs on their ability to prioritize discourse-relevant info, and find that (1) smaller & dialogue-trained models align closer to human patterns, while (2) larger/instruction models overuse structural or discourse cues.
Welcome to UT, Jiaxin!!!! 🥳
Life Update: I will join @UTiSchool as an Assistant Professor in Fall 2026 and will continue my work on LLM, HCI, and Computational Social Science. I'm building a new lab on Human-Centered AI Systems and will be hiring PhD students in the coming cycle!
"Seeing" robins and sparrows may not necessarily make them birdier to LMs! Super excited about this paper -- massive shoutout to all my co-authors, especially @yulu_qin and @dhevarghese for leading the charge!
Does vision training change how language is represented and used in meaningful ways?🤔 The answer is a nuanced yes! Comparing VLM-LM minimal pairs, we find that while the taxonomic organization of the lexicon is similar, VLMs are better at _deploying_ this knowledge. [1/9]
What are patients saying about GLP-1 meds and cancer risk? We analyzed 400K+ Reddit posts using an AI-powered pipeline, revealing major communication gaps: overall cancer risk discussions were low but of those only 19% mentioned talking to a doctor. 🧵 @JAMANetworkOpen
👇Happening this afternoon 4:30pm! Come meet @Yurochkin_M, @RayaHoresh, and I, at East Exhibition Hall #1103. 📍I’m also on the industry job market this coming year! Let’s connect and chat about opportunities in the industry :)
I'll be at #icml2025 @icmlconf to present SPRI next week! Come by our poster on Tuesday, July 15, 4:30pm, and let’s catch up on LLM alignment! 😃 🚀TL;DR: We introduce Situated-PRInciples (SPRI), a framework that automatically generates input-specific principles to align…
Happy to share that EvalAgent has been accepted to #COLM2025 @COLM_conf 🎉🇨🇦 We introduce a framework to identify implicit and diverse evaluation criteria for various open-ended tasks! 📜 arxiv.org/pdf/2504.15219
Evaluating language model responses on open-ended tasks is hard! 🤔 We introduce EvalAgent, a framework that identifies nuanced and diverse criteria 📋✍️. EvalAgent identifies 👩🏫🎓 expert advice on the web that implicitly address the user’s prompt 🧵👇
If you’ll be at #icml2025, check out Hongli’s work on context-specific principles!
I'll be at #icml2025 @icmlconf to present SPRI next week! Come by our poster on Tuesday, July 15, 4:30pm, and let’s catch up on LLM alignment! 😃 🚀TL;DR: We introduce Situated-PRInciples (SPRI), a framework that automatically generates input-specific principles to align…
We have very good frameworks for cooperative dialog… but how about the opposite? @Asher_Zheng00’s new paper takes a game-theoretic view and develops new metrics to quantify non-cooperative language ♟️ Turns out LLMs don’t have the pragmatic capabilities to perceive these…
Language is often strategic, but LLMs tend to play nice. How strategic are they really? Probing into that is key for future safety alignment.🛟 👉Introducing CoBRA🐍, a framework that assesses strategic language. Work with my amazing advisors @jessyjli and @David_Beaver! 🧵👇
CosmicAI collab: benchmarking the utility of LLMs in astronomy coding workflows & focusing on the key research capability of scientific visualization. @sebajoed @jessyjli @Murtazahusaintx @gregd_nlp @StephaJuneau @paultorrey9 Adam Bolton, Stella Offner, Juan Frias, Niall Gaffney
How good are LLMs at 🔭 scientific computing and visualization 🔭? AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results. SOTA models like Gemini 2.5 Pro & Claude 4 Opus only match ground truth scientific utility 16% of the time. 🧵
Language is often strategic, but LLMs tend to play nice. How strategic are they really? Probing into that is key for future safety alignment.🛟 👉Introducing CoBRA🐍, a framework that assesses strategic language. Work with my amazing advisors @jessyjli and @David_Beaver! 🧵👇
Is AI ready to play a real role in science? This work with @CosmicAI_Inst evaluates LLMs targeting the implementation of scientific workflows, and the scientific utility of visualizations from LLM-generated code -- and the answer is not yet, even with the best SOTA models 👇
How good are LLMs at 🔭 scientific computing and visualization 🔭? AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results. SOTA models like Gemini 2.5 Pro & Claude 4 Opus only match ground truth scientific utility 16% of the time. 🧵
How good are LLMs at 🔭 scientific computing and visualization 🔭? AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results. SOTA models like Gemini 2.5 Pro & Claude 4 Opus only match ground truth scientific utility 16% of the time. 🧵
Super thrilled that @kanishkamisra is going to join @UT_Linguistics as our newest computational linguistics faculty member -- looking forward to doing great research together! 🧑🎓Students: Kanishka is a GREAT mentor -- apply to be his PhD student in the upcoming cycle!!
News🗞️ I will return to UT Austin as an Assistant Professor of Linguistics this fall, and join its vibrant community of Computational Linguists, NLPers, and Cognitive Scientists!🤘 Excited to develop ideas about linguistic and conceptual generalization! Recruitment details soon