Suraj Srinivas @ ICML
@Suuraj
ml researcher / trying to understand why deep learning works
One of my most favourite (and thought-provoking) ML papers!
# A new type of information theory this paper is not super well-known but has changed my opinion of how deep learning works more than almost anything else it says that we should measure the amount of information available in some representation based on how *extractable* it is,…
# A new type of information theory this paper is not super well-known but has changed my opinion of how deep learning works more than almost anything else it says that we should measure the amount of information available in some representation based on how *extractable* it is,…
Here's how my recent papers & reviews are going: * To solve a vision problem today, the sensible thing is to leverage a pre-trained VLM or video diffusion model. Such models implicitly represent a tremendous amount about the visual world that we can exploit. * Figure out how to…
‼️🕚New paper alert with @ushabhalla_: Leveraging the Sequential Nature of Language for Interpretability (openreview.net/pdf?id=hgPf1ki…)! 1/n
Have you ever wondered whether a few times of data contamination really lead to benchmark overfitting?🤔 Then our latest paper about the effect of data contamination on LLM evals might be for you!🚀 "How Much Can We Forget about Data Contamination?" (accepted at #ICML2025) shows…
## The case for more ambition i wrote about how AI researchers should ask bigger and simpler questions, and publish fewer papers:
Why does Chain of Thought prompting actually work? @bohang_zhang will be talking about it today. Join us! @Suuraj @tverven
⏰⏰ Theory of Interpretable AI Seminar ⏰⏰ Chain-of-Thought: Why does explaining to LLMs using CoT prompting work? Join us on June 3, when @bohang_zhang will dive into the mechanisms behind chain-of-thought prompting — and what makes it so effective @tverven @Suuraj
We created a canvas that plugs into an image model’s brain. You can use it to generate images in real-time by painting with the latent concepts the model has learned. Try out Paint with Ember for yourself 👇
we live in a world where "verification is easier than generation" is no longer true
#NLProc AI Co-Scientists 🤖 can generate ideas, but can they spot mistakes? (not yet! 🚫) In my recent paper, we introduce SPOT, a dataset of STEM manuscripts (math, materials science, chemistry, physics, etc), annotated with real errors. SOTA models like o3, gemini-2.5-pro…
#NLProc AI Co-Scientists 🤖 can generate ideas, but can they spot mistakes? (not yet! 🚫) In my recent paper, we introduce SPOT, a dataset of STEM manuscripts (math, materials science, chemistry, physics, etc), annotated with real errors. SOTA models like o3, gemini-2.5-pro…
⏰⏰ Theory of Interpretable AI Seminar ⏰⏰ Chain-of-Thought: Why does explaining to LLMs using CoT prompting work? Join us on June 3, when @bohang_zhang will dive into the mechanisms behind chain-of-thought prompting — and what makes it so effective @tverven @Suuraj
data attribution is the most neglected thing in interpretability and people should join me in working on it
Curious about feature attribution? SHAP & LIME treat features independently—but features interact! Come hear how to "Disentangle Interactions and Dependencies in Feature Attribution" Tuesday (tomorrow!) 4pm CET, 10am ET @Suuraj @tverven
⏰⏰Theory of Interpretable AI Seminar ⏰⏰ Interested in Feature Attribution Explanations? In two weeks, May 6, Gunnar König @gcskoenig will talk about "Disentangling Interactions and Dependencies in Feature Attribution" @tverven @Suuraj
In April 2024, we launched the Theory of Interpretable XAI seminar, aiming to build a community—unsure if we’d even have enough speakers. A year later, we’re still growing. New to the seminar? Join us in building the foundations of XAI together @tverven @Suuraj 1/n
⏰⏰Theory of Interpretable AI Seminar ⏰⏰ Interested in Feature Attribution Explanations? In two weeks, May 6, Gunnar König @gcskoenig will talk about "Disentangling Interactions and Dependencies in Feature Attribution" @tverven @Suuraj
Today in **two hours** @mirco_mutti will talk about interpretable bandits Zoom link: uva-live.zoom.us/j/87120549999 @Suuraj @tverven
Can we get a *short* and *interpretable* policy for multi-armed bandits that is guaranteed to perform well? @mirco_mutti will present our (w/ @shiemannor and Jeongyeol Kwon) recent work on this cool new problem in the Theory of Interpretable AI today! (zoom link below)
DID I CRACK IT? I think I figured out at least a chunk of the math. It's trade deficit divided by their exports. EU: exports 531.6, imports 333.4, deficit 198.2. 198.2/531.6 is 37, close to 39. Israel: exports 22.2, imports 14.8, deficit 7.4. 7.4/22.2 is 33.
FULL LIST: Liberation Day
Take a break from arxiv/LW/AF. Sit in the woods with a random textbook and mull new ideas away from interp community lockstep. Diverge. Don’t compete with a saturated subtopic, maybe you’ll get to take weekends off. Premature overinvestment comes from monoculture.
So what should the community do? I'd guess we're over-invested in fundamental SAE research, but shouldn't abandon it completely. And SAEs remain a valuable tool, esp for exploration and debugging I'm most keen on applied work, and making targeted fixes for fundamental issues.