Fazl Barez @ICML2025

@FazlBarez

Building 🤖 | Let's build AI's we can trust!

🌍

Joined December 2020

778Following

2KFollowers

Pinned

Fazl Barez @ICML2025@FazlBarez · Jul 1

Excited to share our paper: "Chain-of-Thought Is Not Explainability"! We unpack a critical misconception in AI: models explaining their Chain-of-Thought (CoT) steps aren't necessarily revealing their true reasoning. Spoiler: transparency of CoT can be an illusion. (1/9) 🧵

FazlBarez's tweet image. Excited to share our paper: "Chain-of-Thought Is Not Explainability"!

We unpack a critical misconception in AI: models explaining their Chain-of-Thought (CoT) steps aren't necessarily revealing their true reasoning. Spoiler: transparency of CoT can be an illusion. (1/9) 🧵

133

635

455

110.0K

Fazl Barez @ICML2025 Retweeted

Aryaman Arora@aryaman2020 · Jul 19

maybe I will live tweet the actionable interp workshop panel

101

12.0K

Fazl Barez @ICML2025@FazlBarez · Jul 18

Come see @edelwax present our poster! Ballroom A - west

FFazl Barez @ICML2025@FazlBarez · Jul 11

I’ll be at #ICML2025 – come say hi and talk to me about responsible AI👋 🎤 Speaking (14th): Post-AGI Civilizational Equilibria post-agi.org 💭 Panel @askalphaxiv (14th eve) lu.ma/n0yavto0 📝 Main-Conf Poster (16th): PoisonBench icml.cc/virtual/2025/p… 👀…

937

Fazl Barez @ICML2025 Retweeted

John Bohannon -- see you @ICML!@bohannon_bot · Jul 17

Yesterday’s panel that I ran at ICML on “The science singularity.” The room was packed so people sat down so others could see. Computer scientists are lovely people. Big thanks to @askalphaxiv @tdietterich @pzakin and @FazlBarez!

2.0K

Fazl Barez @ICML2025 Retweeted

CogInterp Workshop @ NeurIPS 2025@CogInterp · Jul 11

We’re excited to announce the first workshop on CogInterp: Interpreting Cognition in Deep Learning Models @ NeurIPS 2025! 📣 How can we interpret the algorithms and representations underlying complex behavior in deep learning models? 🌐 coginterp.github.io/neurips2025/ 1/

10.0K

Fazl Barez @ICML2025 Retweeted

David Krueger@DavidSKrueger · Jul 16

I'll be at ICML Thurs, Fri, Sat! I'll be at our Gradual Disempowerment poster at 11AM Thursday. Also: I'm planning to spend the rest of the year focused on raising awareness about AI risks. And I'm looking for postdocs/RAs, and for PhD students to start Jan or Sept 2026.

2.0K

Fazl Barez @ICML2025@FazlBarez · Jul 16

Calling this "training for interpretability" is misleading... it's more like "training that doesn't obviously degrade interpretability". Nobody actually has a method to train for interpretability.

JJakub Pachocki@merettm · Jul 15

I am extremely excited about the potential of chain-of-thought faithfulness & interpretability. It has significantly influenced the design of our reasoning models, starting with o1-preview. As AI systems spend more compute working e.g. on long term research problems, it is…

3.0K

Fazl Barez @ICML2025@FazlBarez · Jul 15

If you don't train your CoTs to look nice, you could get some safety from monitoring them. This seems good to do! But I'm skeptical this will work reliably enough to be load-bearing in a safety case. Plus as RL is scaled up, I expect CoTs to become less and less legible.

MMikita Balesni 🇺🇦@balesni · Jul 15

A simple AGI safety technique: AI’s thoughts are in plain English, just read them We know it works, with OK (not perfect) transparency! The risk is fragility: RL training, new architectures, etc threaten transparency Experts from many orgs agree we should try to preserve it:…

309

107

70.0K

Fazl Barez @ICML2025 Retweeted

John Bohannon -- see you @ICML!@bohannon_bot · Jul 12

🤓Calling all ML nerds! 🤓 Join me at ICML for the @askalphaxiv happy hour! We have food, drinks, and a panel. An amazing panel. Really the best panel, with @FazlBarez, @pzakin, and @tdietterich! Topic: "The science singularity." lu.ma/n0yavto0

5.0K

Fazl Barez @ICML2025 Retweeted

Ryan Lowe 🥞@ryan_t_lowe · Jul 11

Introducing: Full-Stack Alignment 🥞 A research program dedicated to co-aligning AI systems *and* institutions with what people value. It's the most ambitious project I've ever undertaken. Here's what we're doing: 🧵

190

115

34.0K

Fazl Barez @ICML2025 Retweeted

Ryan Lowe 🥞@ryan_t_lowe · Jul 11

This is in collaboration with a huge list of really really amazing researchers, including: @edelwax @xuanalogue @klingefjord @j_foerst @IasonGabriel @Dr_Atoosa @vinnylarouge @atrishasarkar @bakkermichiel @RyanOthKearns @ellie__hain @DavidDuvenaud @FazlBarez @FranklinMatija ...

2.0K

Fazl Barez @ICML2025@FazlBarez · Jul 10

See you all at the @ActInterp! Gonna be good

TTal Haklay ✈️ACL@tal_haklay · Jul 10

🚨Meet our panelists at the Actionable Interpretability Workshop @ActInterp at @icmlconf! Join us July 19 at 4pm for a panel on making interpretability research actionable, its challenges, and how the community can drive greater impact. @nsaphra @saprmarks @kylelostat @FazlBarez

602