Iván Arcuschin

@IvanArcus

Independent Researcher | AI Safety & Software Engineering

Argentina

Joined March 2011

195Following

297Followers

Pinned

Iván Arcuschin@IvanArcus · Jun 26

🚨Wanna know how to increase reasoning behaviors in thinking LLMs? Read our recent work! 👇

CConstantin Venhoff@cvenhoff00 · Jun 25

Can we actually control reasoning behaviors in thinking LLMs? Our @iclr_conf workshop paper is out! 🎉 We show how to steer DeepSeek-R1-Distill’s reasoning: make it backtrack, add knowledge, test examples. Just by adding steering vectors to its activations! Details in 🧵👇

270

Iván Arcuschin Retweeted

David Lindner@davlindner · Jul 4

Can frontier models hide secret information and reasoning in their outputs? We find early signs of steganographic capabilities in current frontier models, including Claude, GPT, and Gemini. 🧵

102

17.0K

Iván Arcuschin Retweeted

Fazl Barez @ICML2025@FazlBarez · Jul 1

Excited to share our paper: "Chain-of-Thought Is Not Explainability"! We unpack a critical misconception in AI: models explaining their Chain-of-Thought (CoT) steps aren't necessarily revealing their true reasoning. Spoiler: transparency of CoT can be an illusion. (1/9) 🧵

133

636

456

111.0K

Iván Arcuschin Retweeted

Julian Minder@jkminder · Jun 30

With @Butanium_ and @NeelNanda5 we've just published a post on model diffing that extends our previous paper. Rather than trying to reverse-engineer the full fine-tuned model, model diffing focuses on understanding what makes it different from its base model internally.

105

13.0K

Iván Arcuschin Retweeted

Mikhail Terekhov@MiTerekhov · Jun 12

AI Control is a promising approach for mitigating misalignment risks, but will it be widely adopted? The answer depends on cost. Our new paper introduces the Control Tax—how much does it cost to run the control protocols? (1/8) 🧵

11.0K

Iván Arcuschin@IvanArcus · May 17

🚀 Excited to announce the launch of the AISAR Scholarship, a new initiative to promote AI Safety research in Argentina! 🇦🇷 Together with Agustín Martinez Suñé, we've created this program to support both Argentine established researchers and emerging talent, encouraging…

IvanArcus's tweet image. 🚀 Excited to announce the launch of the AISAR Scholarship, a new initiative to promote AI Safety research in Argentina! 🇦🇷

Together with Agustín Martinez Suñé, we've created this program to support both Argentine established researchers and emerging talent, encouraging…

515

Iván Arcuschin Retweeted

Aaron Mueller@amuuueller · Apr 23

Lots of progress in mech interp (MI) lately! But how can we measure when new mech interp methods yield real improvements over prior work? We propose 😎 𝗠𝗜𝗕: a Mechanistic Interpretability Benchmark!

171

28.0K

Iván Arcuschin Retweeted

Clément Dumas@Butanium_ · Apr 7

New paper w/@jkminder & @NeelNanda5! What do chat LLMs learn in finetuning? Anthropic introduced a tool for this: crosscoders, an SAE variant. We find key limitations of crosscoders & fix them with BatchTopK crosscoders This finds interpretable and causal chat-only features! 🧵

187

120

30.0K