Tom McCoy
@RTomMcCoy
Assistant professor @YaleLinguistics. Studying computational linguistics, cognitive science, and AI. He/him.
🤖🧠NOW OUT IN PNAS🧠🤖 Language models show many surprising behaviors. E.g., they can count 30 items more easily than 29 In Embers of Autoregression, we explain such effects by analyzing what LMs are trained to do pnas.org/doi/10.1073/pn… Major updates since the preprint! 1/n

Transformer-based neural networks achieve impressive performance on coding, math & reasoning tasks that require keeping track of variables and their values. But how can they do that without explicit memory? 📄 Our new ICML paper investigates this in a synthetic setting! 🧵 1/13
One of the most exciting papers I've read in a while - highly recommended!🧠🤖 It gives a compelling account of how the human mind reasons so flexibly, and of what's missing from LLM reasoning
How do people reason while still staying coherent – as if they have an internal ‘world model’ for situations they’ve never encountered? A new paper on open-world cognition (preview at the world models workshop at #ICML2025!)
🤖New paper w/ @zh_herbert_zhou @SimonCharlow @bob_frank in ACL2025 & SCiL 💡We use core ideas from Dynamic Semantics to evaluate LLMs and found that they show human-like judgments on anaphora accessibility but rely on specific lexical cues under more careful scrutiny. 🧵1/6
How do people reason while still staying coherent – as if they have an internal ‘world model’ for situations they’ve never encountered? A new paper on open-world cognition (preview at the world models workshop at #ICML2025!)
So much research is being done about LLMs that it's hard to stay on top of the literature. To help with this, I've made a list of all the most important papers from the past 8 years: rtmccoy.com/pubs/ I hope you enjoy!
Can coding agents autonomously implement AI research extensions? We introduce RExBench, a benchmark that tests if a coding agent can implement a novel experiment based on existing research and code. Finding: Most agents we tested had a low success rate, but there is promise!
LLMs can be programmed by backprop 🔎 In our new preprint, we show they can act as fuzzy program interpreters and databases. After being ‘programmed’ with next-token prediction, they can retrieve, evaluate, and even *compose* programs at test time, without seeing I/O examples.
How well can LLMs understand tasks with complex sets of instructions? We investigate through the lens of RELIC: REcognizing (formal) Languages In-Context, finding a significant overhang between what LLMs are able to do theoretically and how well they put this into practice.
The word "laundry" contains both steps of the laundry process: 1. Undry 2. Dry
Had a fun visit to UChicago/TTIC over the past couple days - really great group doing NLP & CompLing there!