Genglin Liu
@genglin_liu
PhD student @UCLA, #NLP
Excited to share my first project at UCLA! We built MOSAIC — a social network simulator where LLM-powered agents behave like real users on social media. They post, share, flag, and debate the factuality of news content — all at scale. It’s open-source. 🧵 TL;DR 🌐 Realistic…

🌐 Are LLM agents prepared to navigate the rich diversity of cultural and social norms? 🏠 CASA tests them on real-world tasks like online shopping and social discussion forums, revealing that current agents show less than 10% awareness and over 40% norm violations. 🧠 We’re…
As an alternative to RLHF and adversarial training, we released short-circuiting. It makes models ~100x more robust. It works for LLMs, multimodal models, and agents. Unlike before, I now think robustly stopping models from generating harmful outputs may be highly tractable and…
🚨Thrilled to share our new work: AI debate combats misinformation better than single AI advisors! 🤔We tested if two AIs debating opposite sides helps biased humans judge controversial COVID-19 claims more accurately. Paper: arxiv.org/abs/2506.02175 🧵👇 #AI #Debate
🚨 Excited to share our new paper on 𝕏-Teaming! 🤖 Multiagent system for multiturn jaibreaking 🔍 96.2% attack success against Claude 3.7 (immune to single-turn attacks!) 💥 Upto 98.1% attack success on leading model 🛡️ Released 30K safety dataset 🧵below #AI #LLMSafety
🔍New findings of knowledge overshadowing! Why do LLMs hallucinate over all true training data? 🤔Can we predict hallucinations even before model training or inference? 🚀Check out our new preprint: [arxiv.org/pdf/2502.16143] The Law of Knowledge Overshadowing: Towards…
📱Current mobile agents struggle with real-world tasks that align with human needs—like finding the best deal across 3 apps. 💸 Introducing Mobile-Agent-E: a novel mobile assistant designed for complex, long-horizon tasks and capable of self-evolving🐣🐥through experience. 🧵1/3
I'm thrilled to share that our Delphi paper is officially published today at @NatMachIntell after almost four years of hard works from all my amazing collaborators (a quite insane timeline considering the rapid AI world)! Special thanks to the unwavering support of my advisor,…
📢 A single line of code to thoroughly evaluate your LLM for Embodied Decision Making 📢 Please checkout our new NeurIPS D&B Oral Paper!! (Part-1 of my summer intern works @StanfordSVL)
[NeurIPS D&B Oral] Embodied Agent Interface: Benchmarking LLMs for Embodied Agents A single line of code to evaluate your model! 🌟Standardize Goal Specifications: LTL 🌟Standardize Modules and Interfaces: 4 modules, 438 tasks, 1475 goals 🌟Standardize Fine-grained Metrics: 18…
If you're attending ICML 2024, join my 2-hour tutorial on Monday July 22 to explore the Physics of Language Model - all 6 parts. Visit: physics.allen-zhu.com and it will be live-streamed on Zoom. BONUS: this is the premiere of Part 2.1 + 2.2, don't miss out! #ICML2024 #MetaAI
Excited to share our R-Tuning got an outstanding paper award@NAACL 2024! Take a look at this paper to see how to align your LLMs to honesty. arxiv.org/abs/2311.09677 This work is finished during my visit at UIUC. Thanks for Prof. Ji and Prof. Zhang’s supervision!
We have won two NAACL2024 Outstanding Paper Awards! Congratulations to Chi Han, Shizhe Diao, Yi Fung, Xingyao Wang, Yangyi Chen and all students and collaborators! Chi Han @Glaciohound will be on academic job market next year! arxiv.org/pdf/2308.16137 arxiv.org/pdf/2311.09677
🎖 Excited to receive an outstanding paper award at NAACL2024 for LM-Infinite "Zero-Shot Extreme Length Generalization for Large Language Models" work! We extend to 200M length with no parameter updates, with downstream improvements arxiv.org/abs/2308.16137 github.com/Glaciohound/LM…
We have won two NAACL2024 Outstanding Paper Awards! Congratulations to Chi Han, Shizhe Diao, Yi Fung, Xingyao Wang, Yangyi Chen and all students and collaborators! Chi Han @Glaciohound will be on academic job market next year! arxiv.org/pdf/2308.16137 arxiv.org/pdf/2311.09677
We have won two NAACL2024 Outstanding Paper Awards! Congratulations to Chi Han, Shizhe Diao, Yi Fung, Xingyao Wang, Yangyi Chen and all students and collaborators! Chi Han @Glaciohound will be on academic job market next year! arxiv.org/pdf/2308.16137 arxiv.org/pdf/2311.09677
From Claude100K to Gemini10M, we are in the era of long context language models. Why and how a language model can utilize information at any input locations within long context? We discover retrieval heads, a special type of attention head responsible for long-context factuality
Large multimodal models often lack precise low-level perception needed for high-level reasoning, even with simple vector graphics. We bridge this gap by proposing an intermediate symbolic representation that leverages LLMs for text-based reasoning. mikewangwzhl.github.io/VDLM 🧵1/4