Subhash Kantamneni

@thesubhashk

incoming @AnthropicAI. prev @mit Tegmark group. mech interp & alignment

Boston, MA

Joined February 2024

97Following

859Followers

Pinned

Subhash Kantamneni@thesubhashk · Feb 5

(1/N) LLMs represent numbers on a helix? And use trigonometry to do addition? Answers below 🧵

159

943

782

213.0K

Subhash Kantamneni@thesubhashk · Jun 20

the only way i would make a map!

BBrendan Ashworth@normconstant · Jun 19

We’re launching Mundi, the first open-source web GIS built for AI. After years of training geospatial AI models, we decided now is the moment to build the GIS software of the next decade. After the AI Vectorizer, Georeferencer, and Kue (our LLM agent inside QGIS), we realized…

456

Subhash Kantamneni Retweeted

Eric J. Michaud@ericjmichaud_ · May 22

Today, the most competent AI systems in almost *any* domain (math, coding, etc.) are broadly knowledgeable across almost *every* domain. Does it have to be this way, or can we create truly narrow AI systems? In a new preprint, we explore some questions relevant to this goal...

438

345

57.0K

Subhash Kantamneni@thesubhashk · May 16

excited to give a talk tmrw at 1pm est!

MML Collective@ml_collective · May 15

This week at Deep Learning: Classics and Trends we're kicking off a new five part mini-series on LLM Interpretability. Up first: @thesubhashk shows how LLMs represent numbers on a helix and use it to add! Join Friday at 10am PT, zoom here: mlcollective.org/dlct/

743

Subhash Kantamneni@thesubhashk · May 8

happy to be a contributor on this set of ai safety priorities!

MMax Tegmark@tegmark · May 7

This Singapore conference was an amazing AI safety comeback after the Paris flop: great consensus between a who's who from the US, China, top companies, AISI's, etc on what safety research needs to get done: aisafetypriorities.org

317

Subhash Kantamneni@thesubhashk · May 5

cool work! i rly like the idea of “here’s this wacky model behavior, let’s use interp to understand it!”

JJosh Engels@JoshAEngels · May 5

1/6: A recent paper shows that that LLMs are "self aware": when trained to exhibit a behavior like "risk taking", LLMs self report being risky. In a recent blog post, we explore what's happening here: some self awareness behaviors are caused by a simple learned steering vector!🧵

363

Subhash Kantamneni@thesubhashk · Apr 24

hey i’m at ICLR! If you’re interested in chatting about mech interp (especially unsupervised alternatives to SAEs) or alignment (weak-to-strong oversight is on my mind) hmu!

759

Subhash Kantamneni@thesubhashk · Apr 4

nice work combining lagrangian and hamiltonian neural nets! ideally we’d want neural nets to learn physical laws from data and then extract insights they have to improve our own understanding of physics!

ZZiming Liu@ZimingLiu11 · Apr 4

Would two AI scientists disagree with each other, even if trained on the same data? After seeing classical physics, AI scientists disagree at first but converge to known theories (Lagrangian/Hamiltonian) when data become diverse. Check out our paper: arxiv.org/abs/2504.02822

1.0K

Subhash Kantamneni@thesubhashk · Mar 27

Pretty awesome to see Anthropic studying addition in a production model! My intuition is that "number ending in 6" type features are calculated by doing cos(2pi( a-6 ) / 10) - using trigonometric reps of numbers! Would love to see if we can dig out these low level computations

AAnthropic@AnthropicAI · Mar 27

Claude wasn’t designed to be a calculator; it was trained to predict text. And yet it can do math "in its head". How? We find that, far from merely memorizing the answers to problems, it employs sophisticated parallel computational paths to do "mental arithmetic".

459

Subhash Kantamneni@thesubhashk · Mar 26

Pretty awesome work! I think there’s a lot of promise in monitoring and Docent seems like a great step towards that

KKevin Meng@mengk20 · Mar 25

AI models are *not* solving problems the way we think using Docent, we find that Claude solves *broken* eval tasks - memorizing answers & hallucinating them! details in 🧵 we really need to look at our data harder, and it's time to rethink how we do evals...

445

Subhash Kantamneni@thesubhashk · Mar 14

Really cool to see interp focused metrics and holistic measures of SAE quality (including probing!)

AAdam Karvonen@a_karvonen · Mar 14

We're excited to announce the release of SAE Bench 1.0, our suite of Sparse Autoencoder (SAE) evaluations! We have also trained / evaluated a suite of open-source SAEs across 7 architectures. This has led to exciting new qualitative findings! Our findings in the 🧵 below 👇

416

Subhash Kantamneni Retweeted

Reticular (YC F24)@ReticularAI · Mar 13

A First Step Towards Interpretable Protein Structure Prediction With SAEFold, we enable mechanistic interpretability on ESMFold, a protein structure prediction model, for the first time. Watch @NithinParsan demo a case study here w/ links for paper & open-source code 👇

23.0K

Subhash Kantamneni@thesubhashk · Mar 4

New blog post on SAE probing! We argue that SAEs should be evaluated on downstream interp tasks. Unfortunately, SAEs weren’t differentially useful for probing. We think this is a negative result for current SAEs, but we’re hopeful for new SAE/interp methods!

JJosh Engels@JoshAEngels · Mar 4

We wrote up a blog post with some takeaways from our SAE probing project! TLDR, we think future work should focus on showing SAEs are differentially useful on downstream tasks, or should focus on ambitious new types of SAEs/other novel techniques. lesswrong.com/posts/osNKnwiJ…

592