Raj Movva
@rajivmovva
PhD student @Berkeley_AI. ML & society, interpretability, health. @MIT '22.
💡New preprint & Python package: We use sparse autoencoders to generate hypotheses from large text datasets. Our method, HypotheSAEs, produces interpretable text features that predict a target variable, e.g. features in news headlines that predict engagement. 🧵1/
Atul Butte died yesterday. The world lost a giant. A big bear of a man. With a huge smile. With love for everyone. With energy that could power a room. I loved everything about Atul. I loved how he was always happy. I loved how excited he was about science and helping people.
i wake up. something’s wrong with the clock on the wall. the numbers are jumbled. my hands aren’t right. i tell my wife. she responds: “that’s not just an observation—it’s a powerful insight.” i scream.
I've resolved this positively: 2 papers convincingly show sparse autoencoders beating baselines on real tasks: Hypothesis Generation & Auditing LLMs SAEs shine when you don't know what you're looking for, but lack precision. Sometimes the right tool for the job, sometimes not.
Manifold Market: Will Sparse Autoencoders be successfully used on a downstream task in the next year and beat baselines? Stephen Grugett asked me for alignment-relevant markets, this was my best idea. I think SAEs are promising, but how far can they go? manifold.markets/NeelNanda/will…
We're presenting two papers Wednesday at #ICML2025, both at 11am. Come chat about "Sparse Autoencoders for Hypothesis Generation" (west-421), and "Correlated Errors in LLMs" (east-1102)! Short thread ⬇️
1. We will present HypotheSAEs at #ICML2025, Wednesday 11am (West Hall B2-B3 #W-421). 2. Let me know if you'd like to chat about: - AI for hypothesis generation - why SAEs are still useful - whether PhD students should stay in school

This is a nice experiment: if you finetune an Othello next-move-predictor to reconstruct the board from its internal state, the reconstructed boards are often incorrect, but they have the same next moves as the true board! So next token prediction might be "too easy", in that a…
We fine-tune an Othello next-token prediction model to reconstruct boards. Even when the model reconstructs boards incorrectly, the reconstructed boards often get the legal next moves right. Models seem to construct "enough of" the board to calculate single next moves.
The reaction to this result shouldn't just be "what about Opus 4 / o5-ultrapro etc". For example, one takeaway is that human-AI collaboration for coding (what we want) isn't aligned with task-completion benchmarks (what we measure), and we should try to understand why!
We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
Are LLMs correlated when they make mistakes? In our new ICML paper, we answer this question using responses of >350 LLMs. We find substantial correlation. On one dataset, LLMs agree on the wrong answer ~2x more than they would at random. 🧵(1/7)
individual reporting for post-deployment evals — a little manifesto (& new preprints!) tldr: end users have unique insights about how deployed systems are failing; we should figure out how to translate their experiences into formal evaluations of those systems.
Cool exploration of how pretraining data shapes LLMs on medical topics. One result: clinical jargon doesn't show up much in pretraining, but is prevalent in clinical notes (a much-discussed use case for LLMs). Nice use of @allen_ai's What's In My Big Data tool.
🩺 Open-source large language models now perform well across various clinical natural language processing tasks, even though they never see electronic health records. Where do they pick up that clinical knowledge? 🚀 We are excited to share our CHIL 2025 paper “Diagnosing our…
Today is a good day to share my favorite KD fact, which is that he invested in @huggingface all the way back in... 2017. A true champion of open source AI 👑
Excited for the new release of #HuggingFace durant.ly/huggingface - proud investor! durant.ly/hfrelease
Saddened by @atulbutte's passing. When I was in high school, he was gracious enough to do an interview for our student-run science magazine. There are many good reasons he could've said no, but he chose to spend his time inspiring teenagers. It worked on me. RIP, Prof. Butte.

New work 🎉: conformal classifiers return sets of classes for each example, with a probabilistic guarantee the true class is included. But these sets can be too large to be useful. In our #CVPR2025 paper, we propose a method to make them more compact without sacrificing…