Robin Jia
@robinomial
Assistant Professor @CSatUSC | Previously Visiting Researcher @facebookai | Stanford CS PhD @StanfordNLP
Hi all, after a month in hiding in Vienna, I will be attending #ACL2025! Please reach out if you are interested in chatting about LLM memorization, law/policy, or large scale pretraining. I am also interested in AI safety topics like sycophancy if you have results to share. Lmk!
I'll be at #ACL2025 next week! Catch me at the poster sessions, eating sachertorte, schnitzel and speaking about distributional memorization at the @l2m2_workshop
I’ll be at ACL 2025 next week where my group has papers on evaluating evaluation metrics, watermarking training data, and mechanistic interpretability. I’ll also be co-organizing the first Workshop on LLM Memorization @l2m2_workshop on Friday. Hope to see lots of folks there!
1+1=3 2+2=5 3+3=? Many language models (e.g., Llama 3 8B, Mistral v0.1 7B) will answer 7. But why? We dig into the model internals, uncover a function induction mechanism, and find that it’s broadly reused when models encounter surprises during in-context learning. 🧵
Hi all, I'm going to @FAccTConference in Athens this week to present my paper on copyright and LLM memorization. Please reach out if you are interested to chat about law, policy, and LLMs!
Many works addressing copyright for LLMs focus on model outputs and their similarity to copyrighted training data, but few focus on how the model was trained. We analyze LLM memorization w.r.t. their training decisions and theorize on its use in court arxiv.org/abs/2502.16290
If an LLM’s hallucinated claim contradicts its own knowledge, it should be able to retract the claim. Yet, it often reaffirms the claim instead. Why? @yyqcode dives deep to show that faulty model internal beliefs (representations of “truthfulness”) drive retraction failures!
🧐When do LLMs admit their mistakes when they should know better? In our new paper, we define this behavior as retraction: the model indicates that its generated answer was wrong. LLMs can retract—but they rarely do.🤯 arxiv.org/abs/2505.16170 👇🧵
🧐When do LLMs admit their mistakes when they should know better? In our new paper, we define this behavior as retraction: the model indicates that its generated answer was wrong. LLMs can retract—but they rarely do.🤯 arxiv.org/abs/2505.16170 👇🧵
Textual steering vectors can improve visual understanding in multimodal LLMs! You can extract steering vectors via any interpretability toolkit you like -- SAEs, MeanShift, Probes -- and apply them to image or text tokens (or both) of Multimodal LLMs. And They Steer!
For this week’s NLP Seminar, we are thrilled to host @DeqingFu to talk about Closing the Modality Gap: Benchmarking and Improving Visual Understanding in Multimodal LLMs! When: 5/22 Thurs 11am PT Non-Stanford affiliates registration form (closed at 9am PT on the talk day):…
📢 @aclmeeting notifications have been sent out, making this the perfect time to finalize your commitment. Don't miss the opportunity to be part of the workshop! 🔗 Commit here: openreview.net/group?id=aclwe… 🗓️ Deadline: May 20, 2025 (AoE) #ACL2025 #NLProc
Becoming an expert requires first learning the basics of the field. Learning the basics requires doing exercises that AI can do. No amount of class redesign can change this. (What will change: the weight of exams in computing the final grade)
I'm sympathetic to the professors quoted in this, but at a certain point if your students can cheat their way through your class with AI, you probably need to redesign your class. nymag.com/intelligencer/…
I've noticed (& confirmed with multiple people at #naacl2025) that NLP _for_ humans is more popular than ever while NLP _with_ humans (user studies, human eval, crowdsourcing) gets push back from reviewers who often don't consider this a valid contribution for *CL conferences 1/2
Check out @BillJohn1235813 ‘s excellent work on combining LLMs with symbolic planners at NAACL on Thursday! I will also be at NAACL Friday-Sunday, looking forward to chatting about LLM memorization, interpretability, evaluation, and more
At @naaclmeeting this week! I’ll be presenting our work on LLM domain induction with @_jessethomason_ on Thu (5/1) at 4pm in Hall 3, Section I. Would love to connect and chat about LLM planning, reasoning, AI4Science, multimodal stuff, or anything else. Feel free to DM!
At @naaclmeeting this week! I’ll be presenting our work on LLM domain induction with @_jessethomason_ on Thu (5/1) at 4pm in Hall 3, Section I. Would love to connect and chat about LLM planning, reasoning, AI4Science, multimodal stuff, or anything else. Feel free to DM!
I’ll be at @naaclmeeting this week. Excited to meet old and new friends!
Really proud of this interdisciplinary LLM evaluation effort led by @BillJohn1235813 . We teamed up with oncologists from USC Keck SOM to understand LLM failure modes on realistic patient questions. Key finding: LLMs consistently fail to correct patients’ misconceptions!
🚨 New work! LLMs often sound helpful—but fail to challenge dangerous medical misconceptions in real patient questions. We test how well LLMs handle false assumptions in oncology Q&A. 📝 Paper: arxiv.org/abs/2504.11373 🌐 Website: cancermyth.github.io 👇 [1/n]
🚨 New work! LLMs often sound helpful—but fail to challenge dangerous medical misconceptions in real patient questions. We test how well LLMs handle false assumptions in oncology Q&A. 📝 Paper: arxiv.org/abs/2504.11373 🌐 Website: cancermyth.github.io 👇 [1/n]
Hi all, reminder that our direct submission deadline is April 15th! We are co-located at ACL'25 and you can submit archival or non-archival. You can also submit work published elsewhere (non-archival) Hope to see your submission! sites.google.com/view/memorizat…
🎉Congrats to Aryan Gulati & Ryan Wang for receiving Honorable Mentions for the CRA Outstanding Undergraduate Researcher Awards! Aryan, a former CAIS++ co-president, was mentored by CAIS Associate Director @swabhz. Ryan worked with CAIS faculty Robin Jia. viterbischool.usc.edu/news/2025/03/f…