Bowen Jiang (Lauren)
@laurenbjiang
CS PhD @Penn | Research Intern @Microsoft OAR | @Argonne | @UofIllinois | Foundation Models | Post-Training | Personalization | Evaluation | She/Her
🚀 How well can LLMs know you and personalize your response? Turns out, not so much! Introducing the PersonaMem Benchmark -- 👩🏻💻Evaluate LLM's ability to understand evolving persona from 180+ multi-session user-chatbot conversation history 🎯Latest models (GPT-4.1, GPT-4.5,…


I think, I am aware; therefore, I am.
On 4/15, as part of @PennEngineers AI Month 2025, @WarrenCntrPenn faculty affiliate Chris Callison-Burch will speak on a panel on how #AI is reshaping our understanding of intelligence and what it means to be human. Join us! pennengdean.wufoo.com/forms/r1m9r92z…
PersonaMem has been accepted to COLM 2025! @COLM_conf
🚀 How well can LLMs know you and personalize your response? Turns out, not so much! Introducing the PersonaMem Benchmark -- 👩🏻💻Evaluate LLM's ability to understand evolving persona from 180+ multi-session user-chatbot conversation history 🎯Latest models (GPT-4.1, GPT-4.5,…
Anthropic announced they've activated "Al Safety Level 3 Protections" for their latest model. What does this mean, and why does it matter? Let me share my perspective as OpenAl's former lead for dangerous capabilities testing. (Thread)
🚀New feature update in Memobase #Playground! ✅ 10 tagged memory #examples (dating, tech support, therapy...) ✅ Real persona + event traces ♥️Shout-out to PersonaMem by @laurenbjiang team, a strong #benchmark for AI #memory! Try it → app.memobase.io/playground/exa… #AI #LLM
Want to 𝐜𝐮𝐭 𝐑𝐅𝐓 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐭𝐢𝐦𝐞 𝐛𝐲 𝐮𝐩 𝐭𝐨 𝟐× and boost performance? 🚀 Meet 𝑨𝒅𝒂𝑹𝑭𝑻 — a lightweight, plug-and-play curriculum learning method you can drop into any mainstream RFT algorithms (PPO, GRPO, REINFORCE). Less compute. Better results. 🧵 1/n
Vision and contact dynamics are both heavily influenced by geometry, so why do we treat them as separate problems? By combining vision with physics, "Vysics," each informs the other and we can generate accurate shape reconstructions despite major visual occlusions.
👥 Frustrated by chatbots remembering what you said, but never use the memory correctly? We evaluated 15 SoTA LLM chatbots' ability to apply the memory about a user in downstream tasks. Introducing: 𝑷𝒆𝒓𝒔𝒐𝒏𝒂𝑴𝒆𝒎 Benchmark 🥇Gemini 1.5, GPT4.5, GPT4.1 are in the lead ⬇️
This ICLR is the best conference ever. Attendees are extremely friendly and cuddly. ..What do you mean this is the wrong hall?
Mixture of Experts (MoE) is a popular architecture that uses different "experts" to improve Transformer models. The visual below explains how they differ from Transformers. Let's dive in to learn more about MoE!
GPT-4.5 stands out with the best performance on our new personalization benchmark—even without this new memory module—on par with Gemini 1.5. Our paper PersonaMem is currently being uploaded to arXiv. Stay tuned!
Starting today, memory in ChatGPT can now reference all of your past chats to provide more personalized responses, drawing on your preferences and interests to make it even more helpful for writing, getting advice, learning, and beyond.