John Hewitt
@johnhewtt
Assistant Prof @columbia CS. Visiting Researcher @ Google DeepMind. PhD from @stanfordnlp. Language x Neural Nets.
Understanding and control are two sides of the problem of communicating differing concepts between humans and machines. New position paper: Robert Geirhos, @_beenkim, and I argue we must develop neologisms - new words - for human and machine concepts to understand and control AI

‼️Skibidi for Machines! :) Developing language 🔠 between humans🧒 and machines🤖 has long been a dream - the language that will help us expand what we know so that we can communicate with machines better, and create machines better align with us. With @johnhewtt's amazing…
Understanding and control are two sides of the problem of communicating differing concepts between humans and machines. New position paper: Robert Geirhos, @_beenkim, and I argue we must develop neologisms - new words - for human and machine concepts to understand and control AI
Come chat with me at our ICML poster about interpretability as a communication problem, and the need to derive new words for referencing language model concepts! 4:30PM-7, East Exhibition Hall A-B #E-500 We Can’t Understand AI Using our Existing Vocabulary
Understanding and control are two sides of the problem of communicating differing concepts between humans and machines. New position paper: Robert Geirhos, @_beenkim, and I argue we must develop neologisms - new words - for human and machine concepts to understand and control AI
I'll be at ICML this year! Reach out if: - you want to chat -- great! -- sign up here calendar.app.google/qtDkRmS1uV3pLz… and/or DM me. - you want to fund my lab @ Columbia -- also great! -- research into deeply understanding language models for alignment, safety, performance. email me.
I’m beginning to share notes from my upcoming fall 2025 NLP class, Columbia COMS 4705. First up, some notes to help students brush up on math. Vectors, matrices, eigenstuff, probability distributions, entropy, divergences, matrix calculus cs.columbia.edu/~johnhew/coms4…
We (@_beenkim @johnhewtt @NeelNanda5 Noah Fiedel Oyvind Tafjord) propose a research direction called 🤖agentic interpretability: we can and should ask and help AI systems to build mental models of us which will help us to build mental models of the LLMs. arxiv.org/abs/2506.12152…
I wrote a note on linear transformations and symbols that traces a common conversation/interview I've had with students. Outer products, matrix rank, eigenvectors, linear RNNs -- the topics are really neat, and lead to great discussions of intuitions. cs.columbia.edu/~johnhew//fun-…
I’ll be at neurips for a bit! If you want to talk in person about a PhD in my lab at Columbia, book a slot here: calendar.app.google/RWkDQVvmXkUkxz… If your organization wants to fund LLM understanding/interpretability/control research, reach out to me!
📢 Join us tomorrow at 10 AM PST for the next DLCT talk featuring @johnhewtt! He’ll dive into "Instruction Following without Instruction Tuning"—exploring innovative approaches to model training and task generalization.
This was a really fun project. Fine-tuning a model on "" => response produces a model that can do instruction => response What??
If I finetune my LM just on responses, without conditioning on instructions, what happens when I test it with an instruction? Or if I finetune my LM just to generate poems from poem titles? Either way, the LM will roughly follow new instructions! Paper: arxiv.org/pdf/2409.14254