Yating Wu
@YatingWu96
ECE Ph.D student @ UT Austin, advised by @jessyjli and @AlexGDimakis | テキサス大学に在籍する博士生
LLMs can mimic human curiosity by generating open-ended inquisitive questions given some context, similar to how humans wonder when they read. But which ones are more important to be answered?🤔 We predict the salience of questions, substantially outperforming GPT-4.🌟 🧵1/5

Have that eerie feeling of déjà vu when reading model-generated text 👀, but can’t pinpoint the specific words or phrases 👀? ✨We introduce QUDsim, to quantify discourse similarities beyond lexical, syntactic, and content overlap.
🌟Job ad🌟 We (@gregd_nlp, @mattlease and I) are hiring a postdoc fellow within the CosmicAI Institute, to do galactic work with LLMs and generative AI! If you would like to push the frontiers of foundation models to help solve myths of the universe, please apply!
Seeking candidates for a postdoctoral position with the Explorable Universe research group to perform research on developing next-generation generative AI copilots & agents to aid astronomy research. Info here cosmicai.org/jobs/postdocge…
LLMs trained to memorize new facts can’t use those facts well.🤔 We apply a hypernetwork to ✏️edit✏️ the gradients for fact propagation, improving accuracy by 2x on a challenging subset of RippleEdit!💡 Our approach, PropMEND, extends MEND with a new objective for propagation.
Language is often strategic, but LLMs tend to play nice. How strategic are they really? Probing into that is key for future safety alignment.🛟 👉Introducing CoBRA🐍, a framework that assesses strategic language. Work with my amazing advisors @jessyjli and @David_Beaver! 🧵👇
How good are LLMs at 🔭 scientific computing and visualization 🔭? AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results. SOTA models like Gemini 2.5 Pro & Claude 4 Opus only match ground truth scientific utility 16% of the time. 🧵
A key hypothesis in the history of linguistics is that different constructions share underlying structure. We take advantage of recent advances in mechanistic interpretability to test this hypothesis in Language Models. New work with @kmahowald and @ChrisGPotts! 🧵👇
Introducing ChartMuseum🖼️, testing visual reasoning with diverse real-world charts! ✍🏻Entirely human-written questions by 13 CS researchers 👀Emphasis on visual reasoning – hard to be verbalized via text CoTs 📉Humans reach 93% but 63% from Gemini-2.5-Pro & 38% from Qwen2.5-72B
Can we generate long text from compressed KV cache? We find existing KV cache compression methods (e.g., SnapKV) degrade rapidly in this setting. We present 𝐑𝐞𝐟𝐫𝐞𝐬𝐡𝐊𝐕, an inference method which ♻️ refreshes the smaller KV cache, which better preserves performance.
Do you want to know what information LLMs prioritize in text synthesis tasks? Here's a short 🧵 about our new paper: an interpretable framework for salience analysis in LLMs. First of all, information salience is a fuzzy concept. So how can we even measure it?
Constitutional AI works great for aligning LLMs, but the principles can be too generic to apply. Can we guide responses with context-situated principles instead? Introducing SPRI, a system that produces principles tailored to each query, with minimal to no human effort. [1/5]
🔔 I'm recruiting multiple fully funded MSc/PhD students @UAlberta for Fall 2025! Join my lab working on NLP, especially reasoning and interpretability (see my website for more details about my research). Apply by December 15!
github.com/mlfoundations/… I’m excited to introduce Evalchemy 🧪, a unified platform for evaluating LLMs. If you want to evaluate an LLM, you may want to run popular benchmarks on your model, like MTBench, WildBench, RepoBench, IFEval, AlpacaEval etc as well as standard pre-training…
I'm thrilled to announce our paper "Which questions should I answer? Salience Prediction of Inquisitive Questions" has won an outstanding paper in EMNLP 2024🥳🥳. Thank you so much for my amazing co-authors and advisors!!! @ritikarmangla, @AlexGDimakis, @gregd_nlp, @jessyjli

Correction: This will be happening tomorrow - Nov 13 (Wed) from 11:15 to 11:30 AM ET in conference room Flagler.
✨ Exciting news! I’ll be presenting our work at #EMNLP2024 in an oral session on Nov 13 (Wed) from 10:15 to 10:30 AM (Session 6). Come say hi!
✨ Exciting news! I’ll be presenting our work at #EMNLP2024 in an oral session on Nov 13 (Wed) from 10:15 to 10:30 AM (Session 6). Come say hi!
LLMs can mimic human curiosity by generating open-ended inquisitive questions given some context, similar to how humans wonder when they read. But which ones are more important to be answered?🤔 We predict the salience of questions, substantially outperforming GPT-4.🌟 🧵1/5
Why are there so many different methods for using diffusion models for inverse problems? 🤔 And how do these methods relate to each other? In this survey, we review more than 35 different methods and we attempt to unify them into common mathematical formulations.
🎩“The math of the people, by the people, for the people, shall not perish from our models” AI math systems often abstract away from language by augmenting LLMs with symbolic solvers and logical systems. While promising, is something lost? 🧵
One of the big problems in AI is that the systems often hallucinate. What does that mean exactly and how do we mitigate this problem, especially for RAG systems? 1. Hallucinations and Factuality Factuality refers to the quality of being based on generally accepted facts. For…
🧐🔡🤖 Can LMs/NNs inform CogSci? This question has been (re)visited by many people across decades. @najoungkim and I contribute to this debate by using NN-based LMs to generate novel experimental hypotheses which can then be tested with humans!
Excited to launch the first model from our startup: Bespoke Labs. Bespoke-Minicheck-7B is a grounded factuality checker: super lightweight and fast. Outperforms all big foundation models including Claude 3.5 Sonnet, Mistral-Large m2 and GPT 4o and its only 7B. Also, I want to…
🤔 Want to know if your LLMs are factual? You need LLM fact-checkers. 📣 Announcing the LLM-AggreFact leaderboard to rank LLM fact-checkers. 📣 Want the best model? Check out @bespokelabsai’s’ Bespoke-Minicheck-7B model, which is the current SOTA fact-checker and is cheap and…