Zora Wang
@ZhiruoW
PhD student @LTIatCMU + visiting @StanfordNLP | prev @Amazon Alexa AI, @Microsoft Research, Asia | fun 👩🏻💻 🐈 💃 🪴 🎶
If you're attending #ICML2025, check out our 💭 Agent Workflow Memory for online adaptive agents: Jul 17 4:30-7pm @ West Hall 🔎 RAGGED for designing scalable and stable RAG systems: Jul 16 11:00-13:30 @ East Hall Computer Use Agent Workshop on Jul 19 🌐 "Universal Retrieval for…
🇦🇹I'll be at #ACL2025! Recently I've been thinking about: ✨linguistically + cognitively-motivated evals (as always!) ✨understanding multilingualism + representation learning (new!) I'll also be presenting a poster for BehaviorBox on Wed @ Poster Session 4 (Hall 4/5, 10-11:30)!
When it comes to text prediction, where does one LM outperform another? If you've ever worked on LM evals, you know this question is a lot more complex than it seems. In our new #acl2025 paper, we developed a method to find fine-grained differences between LMs: 🧵1/9
A short 📹 explainer video on how LLMs can overthink in humanlike ways 😲! had a blast presenting this at #icml2025 🥳
🥳 Gap year update: I'll be joining @allen_ai/@UW for 1 year (Sep2025-Jul2026 -> @JHUCompSci) & looking forward to working with amazing folks there, incl. @RanjayKrishna, @HannaHajishirzi, Ali Farhadi. 🚨 I’ll also be recruiting PhD students for my group at @JHUCompSci for Fall…
Sharing some personal updates 🥳: - I've completed my PhD at @unccs! 🎓 - Starting Fall 2026, I'll be joining the Computer Science dept. at Johns Hopkins University (@JHUCompSci) as an Assistant Professor 💙 - Currently exploring options + finalizing the plan for my gap year (Aug…
The real insight: 99% of the world's best ideas are trapped in the heads of people who can't code. They have problems. They know the solutions. They just can't build them. AI shouldn't help engineers code faster. It should help everyone else build.
Life Update: I will join @UTiSchool as an Assistant Professor in Fall 2026 and will continue my work on LLM, HCI, and Computational Social Science. I'm building a new lab on Human-Centered AI Systems and will be hiring PhD students in the coming cycle!
Proud and happy to see OpenAgentSafety coming out! Further pushing the frontier of interactional safety risks in human-AI agent collaboration. Kudos to @sanidhya903 and @Aditya_Soni_8 who led the projects!
1/ AI agents are increasingly being deployed for real-world tasks, but how safe are they in high-stakes settings? 🚨 NEW: OpenAgentSafety - A comprehensive framework for evaluating AI agent safety in realistic scenarios across eight critical risk categories. 🧵
Poster session switched to today!
1) Agent Workflow Memory. Allow agents to adapt online to carry out new tasks more accurately by inducing workflows for common sub-tasks. Today (Wed 7/17): 4:30-7pm. West Exhibition Hall B2-B3 W-202. Also at the CUA workshop, morning of Sat 7/19.
Excited to be hanging out today at @WiMLworkshop 👩🏻💻 Come say hi during the poster session 🕝 2:45–3:30pm 📍 West Meeting Room 211–214 Let’s chat about how coding agents are changing developer workflows! 🤖💻🔧✨
Some updates 🚨 I finished my Ph.D at @uwcse in June 2025! After a year at AI2 as a Research Scientist, I am joining CMU @LTIatCMU & @mldcmu (courtesy) as an Assistant Professor in Fall 2026. The journey, acknowledgments & recruiting in 🧵
1/ AI agents are increasingly being deployed for real-world tasks, but how safe are they in high-stakes settings? 🚨 NEW: OpenAgentSafety - A comprehensive framework for evaluating AI agent safety in realistic scenarios across eight critical risk categories. 🧵
LLMs do not have a native mechanism for "learning" from experience or data. Long-context and weight updates (eg via RL or SFT) are tools we can use to help enable learning, but they are not ultimate solution. To build LLM agents that can actually learn, you need a "context…
Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly…
Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly…
User simulators bridge RL with real-world interaction // jessylin.com/2025/07/10/use… How do we get the RL paradigm to work on tasks beyond math & code? Instead of designing datasets, RL requires designing environments. Given that most non-trivial real-world tasks involve…
We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
ASI is now accepted to @COLM_conf #COLM2025! 🍁 🔗 arxiv.org/abs/2504.06821
Meet ASI: Agent Skill Induction A framework for online programmatic skill learning — no offline data, no training. 🧠 Build reusable skills during test 📈 +23.5% success, +15.3% efficiency 🌐 Scales to long-horizon tasks, transfers across websites Let's dive in! 🧵
📣Excited to announce that the 4th #DL4C workshop “Deep Learning for Code in the Agentic Era" is coming to @NeurIPSConf 2025! AI coding agents are transforming software development at an unprecedented pace. Join us to explore the cutting edge of agent-based programming,…
What will software development look like in 2026? With coding agents rapidly improving, dev roles may look quite different. My current workflow has changed a lot: - Work in github, not IDEs - Agents in parallel - Write English, not code - More code review Thoughts + a video👇
Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.
seems big AI labs are hyperfixating on reasoning when they should focus on *memory* instead normal people won't use models that can think for hours to solve hard math problems people want models that learn over time, remember details, adapt and interact like a person would