Raj Movva

@rajivmovva

PhD student @Berkeley_AI. ML & society, interpretability, health. @MIT '22.

@rajmovva.bsky.social

Joined December 2016

499Following

1KFollowers

Pinned

Raj Movva@rajivmovva · Mar 18

💡New preprint & Python package: We use sparse autoencoders to generate hypotheses from large text datasets. Our method, HypotheSAEs, produces interpretable text features that predict a target variable, e.g. features in news headlines that predict engagement. 🧵1/

131

20.0K

Pinned

Raj Movva Retweeted

euan ashley@euanashley · Jun 14

Atul Butte died yesterday. The world lost a giant. A big bear of a man. With a huge smile. With love for everyone. With energy that could power a room. I loved everything about Atul. I loved how he was always happy. I loved how excited he was about science and helping people.

609

199.0K

Raj Movva Retweeted

frye@___frye · Jul 22

i wake up. something’s wrong with the clock on the wall. the numbers are jumbled. my hands aren’t right. i tell my wife. she responds: “that’s not just an observation—it’s a powerful insight.” i scream.

132

2.0K

33.0K

3.0K

1.0M

Raj Movva@rajivmovva · Jul 16

I've resolved this positively: 2 papers convincingly show sparse autoencoders beating baselines on real tasks: Hypothesis Generation & Auditing LLMs SAEs shine when you don't know what you're looking for, but lack precision. Sometimes the right tool for the job, sometimes not.

NNeel Nanda@NeelNanda5 · Sep 1

Manifold Market: Will Sparse Autoencoders be successfully used on a downstream task in the next year and beat baselines? Stephen Grugett asked me for alignment-relevant markets, this was my best idea. I think SAEs are promising, but how far can they go? manifold.markets/NeelNanda/will…

210

107

19.0K

Raj Movva Retweeted

Kenny Peng@kennylpeng · Jul 16

We're presenting two papers Wednesday at #ICML2025, both at 11am. Come chat about "Sparse Autoencoders for Hypothesis Generation" (west-421), and "Correlated Errors in LLMs" (east-1102)! Short thread ⬇️

401

Raj Movva@rajivmovva · Jul 15

1. We will present HypotheSAEs at #ICML2025, Wednesday 11am (West Hall B2-B3 #W-421). 2. Let me know if you'd like to chat about: - AI for hypothesis generation - why SAEs are still useful - whether PhD students should stay in school

rajivmovva's tweet image. 1. We will present HypotheSAEs at #ICML2025, Wednesday 11am (West Hall B2-B3 #W-421).

2. Let me know if you'd like to chat about:
- AI for hypothesis generation
- why SAEs are still useful
- whether PhD students should stay in school

16.0K

Raj Movva@rajivmovva · Jul 11

This is a nice experiment: if you finetune an Othello next-move-predictor to reconstruct the board from its internal state, the reconstructed boards are often incorrect, but they have the same next moves as the true board! So next token prediction might be "too easy", in that a…

KKeyon Vafa@keyonV · Jul 11

We fine-tune an Othello next-token prediction model to reconstruct boards. Even when the model reconstructs boards incorrectly, the reconstructed boards often get the legal next moves right. Models seem to construct "enough of" the board to calculate single next moves.

3.0K

Raj Movva@rajivmovva · Jul 10

The reaction to this result shouldn't just be "what about Opus 4 / o5-ultrapro etc". For example, one takeaway is that human-AI collaboration for coding (what we want) isn't aligned with task-completion benchmarks (what we measure), and we should try to understand why!

MMETR@METR_Evals · Jul 10

We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.

352

Raj Movva Retweeted

Kenny Peng@kennylpeng · Jul 3

Are LLMs correlated when they make mistakes? In our new ICML paper, we answer this question using responses of >350 LLMs. We find substantial correlation. On one dataset, LLMs agree on the wrong answer ~2x more than they would at random. 🧵(1/7)

212

163

18.0K

Raj Movva Retweeted

jessica dai@jessicadai_ · Jul 1

individual reporting for post-deployment evals — a little manifesto (& new preprints!) tldr: end users have unique insights about how deployed systems are failing; we should figure out how to translate their experiences into formal evaluations of those systems.

135

27.0K

Raj Movva@rajivmovva · Jun 26

Cool exploration of how pretraining data shapes LLMs on medical topics. One result: clinical jargon doesn't show up much in pretraining, but is prevalent in clinical notes (a much-discussed use case for LLMs). Nice use of @allen_ai's What's In My Big Data tool.

FFurong Jia@Flora__Jia · Jun 26

🩺 Open-source large language models now perform well across various clinical natural language processing tasks, even though they never see electronic health records. Where do they pick up that clinical knowledge? 🚀 We are excited to share our CHIL 2025 paper “Diagnosing our…

2.0K

Raj Movva@rajivmovva · Jun 22

Today is a good day to share my favorite KD fact, which is that he invested in @huggingface all the way back in... 2017. A true champion of open source AI 👑

KKevin Durant@KDTrey5 · Mar 9, 2017

Excited for the new release of #HuggingFace durant.ly/huggingface - proud investor! durant.ly/hfrelease

501

62.0K

Raj Movva@rajivmovva · Jun 14

Saddened by @atulbutte's passing. When I was in high school, he was gracious enough to do an interview for our student-run science magazine. There are many good reasons he could've said no, but he chose to spend his time inspiring teenagers. It worked on me. RIP, Prof. Butte.

rajivmovva's tweet image. Saddened by @atulbutte's passing.
When I was in high school, he was gracious enough to do an interview for our student-run science magazine. There are many good reasons he could've said no, but he chose to spend his time inspiring teenagers. It worked on me. RIP, Prof. Butte.

4.0K

Raj Movva Retweeted

Divya Shanmugam@dmshanmugam · Jun 14

New work 🎉: conformal classifiers return sets of classes for each example, with a probabilistic guarantee the true class is included. But these sets can be too large to be useful. In our #CVPR2025 paper, we propose a method to make them more compact without sacrificing…

2.0K