Max Zuo
@max_zuo
Ph.D Student @BrownCSDept | Generalization via Limited Supversion & RL | Advisors: @mlittmancs & @stevebach. Previously @gtcomputing & @google.
Ever wonder if LLMs use tools🛠️ the way we ask them? We explore LLMs using classical planners: are they writing *correct* PDDL (planning) problems? Say hi👋 to Planetarium🪐, a benchmark of 132k natural language & PDDL problems. 📜 Preprint: arxiv.org/abs/2407.03321 🧵1/n

We're happy to announce that effective as of July 1, 2025, faculty members @stevebach and @drsrinathsridha have received named chairs. Steve is now the Eliot Horowitz Assistant Professor in CS and Srinath is the John E. Savage Assistant Professor in CS: cs.brown.edu/news/2025/06/0…
Excited to present our paper arxiv.org/abs/2407.03321 at #NAACL this Friday, May 2, at 10am in Ballroom A! If you're interested in LLMs and Planning, I hope you'll join us to hear about our work!
I will be at #ICLR2025 in a few days to present this work with @surajk610! Feel free to DM me if you want to chat about mechinterp, cognitive science, or anything else!
How robust are in-context algorithms? In new work with @michael_lepori, @jack_merullo, and @brown_nlp, we explore why in-context learning disappears over training and fails on rare and unseen tokens. We also introduce a training intervention that fixes these failures.
I started a blog! First post is everything I know about setting up (fast, reproducible, error-proof) Python project environments using the latest tools. These methods have saved me a lot of grief. Also a short guide to CUDA in appendix :) blog.apoorvkh.com/posts/project-…
If we guide the activation in the ‘right’ part of the subspace, we can improve performance pretty dramatically, although we don’t completely fix the problem.
Using the composition score, we find two highly communicating heads, then using text corpus data, find highly activating contexts. In this case we look at a component in head 8.3 which composes super strongly with a mover head
Round-up 🧵 of our papers at #EMNLP2024: Reach out or get in touch with lead authors if you'd like to chat! #1 If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions arxiv.org/abs/2403.16442 Tue Nov. 12 11am Jasmine
🤔How do multilingual LLMs encode structural similarities across languages? 🌟We find that LLMs use identical circuits when languages share the same morphosyntactic processes. However, they involve specialized components to handle tasks if contain specific linguistic features⤵️
Not only can't LLMs plan, they can't even generate specifications of a problem (in PDDL) that a standard planner could solve.
Ever wonder if LLMs use tools🛠️ the way we ask them? We explore LLMs using classical planners: are they writing *correct* PDDL (planning) problems? Say hi👋 to Planetarium🪐, a benchmark of 132k natural language & PDDL problems. 📜 Preprint: arxiv.org/abs/2407.03321 🧵1/n