Leo Liu
@ZEYULIU10
PhD at UT Austin ex-{uw, isi, facebook} nlper
LLMs trained to memorize new facts can’t use those facts well.🤔 We apply a hypernetwork to ✏️edit✏️ the gradients for fact propagation, improving accuracy by 2x on a challenging subset of RippleEdit!💡 Our approach, PropMEND, extends MEND with a new objective for propagation.


Check out our work led by @Cumquaaa on a hybrid autoregressive-diffusion architecture for image generation -- it flexibly balances the number of autoregressive and diffusion layers for optimal generation quality and inference speed! Autoregressive vs. diffusion -- you don't have…
🚀 Training an image generation model and picking sides between autoregressive (AR) and diffusion? Why not both? Check out MADFormer with half of the model layers for AR and half for diffusion. AR gives a fast guess for the next patch prediction while diffusion helps refine the…
🚨Excited to announce MEXA, a general multimodal reasoning framework which -- Dynamically selects query-related expert models. -- Deep reasoning on the expert outputs -- Training-free and easy to generalize to wide task/modality Check out the paper/thread for more details 👇!
New paper Alert 🚨 Introducing MEXA: A general and training-free multimodal reasoning framework via dynamic multi-expert skill selection, aggregation and deep reasoning! MEXA: 1. Selects task- and modality-relevant experts based on the query and various required multimodal…
Meet Casper👻, a friendly robot sidekick who shadows your day, decodes your intents on the fly, and lends a hand while you stay in control! Instead of passively receiving commands, what if a robot actively sense what you need in the background, and step in when confident? (1/n)
🤔 Recent mech interp work showed that retrieval heads can explain some long-context behavior. But can we use this insight for retrieval? 📣 Introducing QRHeads (query-focused retrieval heads) that enhance retrieval Main contributions: 🔍 Better head detection: we find a…
Super thrilled that @kanishkamisra is going to join @UT_Linguistics as our newest computational linguistics faculty member -- looking forward to doing great research together! 🧑🎓Students: Kanishka is a GREAT mentor -- apply to be his PhD student in the upcoming cycle!!
News🗞️ I will return to UT Austin as an Assistant Professor of Linguistics this fall, and join its vibrant community of Computational Linguists, NLPers, and Cognitive Scientists!🤘 Excited to develop ideas about linguistic and conceptual generalization! Recruitment details soon
Solving complex problems with CoT requires combining different skills. We can do this by: 🧩Modify the CoT data format to be “composable” with other skills 🔥Train models on each skill 📌Combine those models Lead to better 0-shot reasoning on tasks involving skill composition!
Have you thought about making your reasoning model stronger through *skill composition*? It's not as hard as you'd imagine! Check out our work!!!
Solving complex problems with CoT requires combining different skills. We can do this by: 🧩Modify the CoT data format to be “composable” with other skills 🔥Train models on each skill 📌Combine those models Lead to better 0-shot reasoning on tasks involving skill composition!
Check out our cool VQA dataset that challenges VLM models for reasoning capability!!
Introducing ChartMuseum🖼️, testing visual reasoning with diverse real-world charts! ✍🏻Entirely human-written questions by 13 CS researchers 👀Emphasis on visual reasoning – hard to be verbalized via text CoTs 📉Humans reach 93% but 63% from Gemini-2.5-Pro & 38% from Qwen2.5-72B
🤔Now most LLMs have >= 128K context sizes, but are they good at generating long outputs, such as writing 8K token chain-of-thought for a planning problem? 🔔Introducing LongProc (Long Procedural Generation), a new benchmark with 6 diverse tasks that challenge LLMs to synthesize…
🎓 I am recruiting MSc/PhD students @UAlberta for Fall 2025! 🎯 Directions: Neurosymbolic programming, LLM-based programming support, and human-centered synthesis framework. See my website for full details on potential research opportunities.
What's the only thing better than NASSLLI (North American Summer School for Logic, Language and Information)? Summer in Seattle! Luckily, you can have both: I'm hosting NASSLLI at UW this summer. nasslli25.shane.st Please share widely, submit proposals, and attend!
This project started with us annoyed at papers evaluating CoT "reasoning" with only GSM8k & MATH. We didn't expect to find such strong evidence that these are the only type of problem where CoT helps! Credit to @juand_r_nlp & @kmahowald for driving the rigorous meta-analysis!
To CoT or not to CoT?🤔 300+ experiments with 14 LLMs & systematic meta-analysis of 100+ recent papers 🤯Direct answering is as good as CoT except for math and symbolic reasoning 🤯You don’t need CoT for 95% of MMLU! CoT mainly helps LLMs track and execute symbolic computation
👽Have you ever accidentally opened a .jpeg file with a text editor (or a hex editor)? Your language model can learn from these seemingly gibberish bytes and generate images with them! Introducing *JPEG-LM* - an image generator that uses exactly the same architecture as LLMs…