Aaron Mueller

@amuuueller

Asst. Prof. in CS at @BU_Tweets ≡ {Mechanistic, causal} {interpretability, computational linguistics} ≡ Formerly: PhD @jhuclsp

Boston, MA

Joined September 2015

716Following

2KFollowers

Pinned

Aaron Mueller@amuuueller · Apr 3, 2024

Excited this project is out! Using sparse feature circuits, we can explain and modify how LMs arrive at a behavior. In this thread, I want to highlight open directions where computational linguists can use sparse feature circuits. 🧵

SSamuel Marks@saprmarks · Apr 3, 2024

Can we understand & edit unanticipated mechanisms in LMs? We introduce sparse feature circuits, & use them to explain LM behaviors, discover & fix LM bugs, & build an automated interpretability pipeline! Preprint w/ @can_rager, @ericjmichaud_, @boknilev, @davidbau, @amuuueller

13.0K

Aaron Mueller Retweeted

Hadas Orgad @ ICML@OrgadHadas · Jul 17

We're presenting the Mechanistic Interpretability Benchmark (MIB) now! Come and chat - East 1205. Project led by @amuuueller @AtticusGeiger @sarahwiegreffe

6.0K

Aaron Mueller@amuuueller · Jul 17

If you're at #ICML2025, chat with me, @sarahwiegreffe, Atticus, and others at our poster 11am - 1:30pm at East #1205! We're establishing a 𝗠echanistic 𝗜nterpretability 𝗕enchmark. We're planning to keep this a living benchmark; come by and share your ideas/hot takes!

amuuueller's tweet image. If you're at #ICML2025, chat with me, @sarahwiegreffe, Atticus, and others at our poster 11am - 1:30pm at East #1205! We're establishing a 𝗠echanistic 𝗜nterpretability 𝗕enchmark.

We're planning to keep this a living benchmark; come by and share your ideas/hot takes!

2.0K

Aaron Mueller Retweeted

Nikhil Prakash@nikhil07prakash · Jun 24

How do language models track mental states of each character in a story, often referred to as Theory of Mind? Our recent work takes a step in demystifing it by reverse engineering how Llama-3-70B-Instruct solves a simple belief tracking task, and surprisingly found that it…

566

622

94.0K

Aaron Mueller Retweeted

Jackson Petty@jowenpetty · Jun 9

How well can LLMs understand tasks with complex sets of instructions? We investigate through the lens of RELIC: REcognizing (formal) Languages In-Context, finding a significant overhang between what LLMs are able to do theoretically and how well they put this into practice.

100

17.0K

Aaron Mueller Retweeted

Joshua Rozner@jsrozner · Jun 9

BabyLMs first constructions: new study on usage-based language acquisition in LMs w/ @LAWeissweiler, @coryshain. Simple interventions show that LMs trained on cognitively plausible data acquire diverse constructions (cxns) @babyLMchallenge 🧵

3.0K

Aaron Mueller Retweeted

David Bau@davidbau · Jun 1

Dear MAGA friends, I have been worrying about STEM in the US a lot, because right now the Senate is writing new laws that cut 75% of the STEM budget in the US. Sorry for the long post, but the issue is really important, and I want to share what I know about it. The entire…

475

138

121.0K

Aaron Mueller Retweeted

Joe Stacey@_joestacey_ · May 27

We have a new paper up on arXiv! 🥳🪇 The paper tries to improve the robustness of closed-source LLMs fine-tuned on NLI, assuming a realistic training budget of 10k training examples. Here's a 60 second rundown of what we found!

9.0K

Aaron Mueller Retweeted

Tomer Ashuach@tomerashuach · May 27

🚨New paper at #ACL2025 Findings! REVS: Unlearning Sensitive Information in LMs via Rank Editing in the Vocabulary Space. LMs memorize and leak sensitive data—emails, SSNs, URLs from their training. We propose a surgical method to unlearn it. 🧵👇w/@boknilev @mtutek 1/8

4.0K

Aaron Mueller Retweeted

Tal Haklay ✈️ACL@tal_haklay · May 19

Our paper "Position-Aware Circuit Discovery" got accepted to ACL! 🎉 Huge thanks to my collaborators🙏 @OrgadHadas @davidbau @amuuueller @boknilev See you in Vienna! 🇦🇹 #ACL2025 @aclmeeting

186

14.0K

Aaron Mueller Retweeted

Yonatan Belinkov@boknilev · May 15

BlackboxNLP will be co-located with #EMNLP2025 in Suzhou this November! 📷This edition will feature a new shared task on circuits/causal variable localization in LMs, details: blackboxnlp.github.io/2025/task If you're into mech interp and care about evaluation, please submit!

10.0K

Aaron Mueller Retweeted

Ethan Gotlieb Wilcox@weGotlieb · May 12

📣Paper Update 📣It’s bigger! It’s better! Even if the language models aren’t. 🤖New version of “Bigger is not always Better: The importance of human-scale language modeling for psycholinguistics” osf.io/preprints/psya…

5.0K

Aaron Mueller Retweeted

babyLM@babyLMchallenge · May 9

Close your books, test time! The evaluation pipelines are out, baselines are released and the challenge is on. There is still time to join and we are excited to learn from you on pretraining and the gaps between humans and models. *Don't forget to fast-eval on checkpoints

1.0K

Aaron Mueller@amuuueller · Apr 26

Presenting sparse feature circuits today at 3pm-5:30pm! Come say hi at poster #495

3.0K

Aaron Mueller Retweeted

Yanai Elazar@yanaiela · Apr 25

💡 New ICLR paper! 💡 "On Linear Representations and Pretraining Data Frequency in Language Models": We provide an explanation for when & why linear representations form in large (or small) language models. Led by @jack_merullo_ , w/ @nlpnoah & @sarahwiegreffe

213

130

27.0K