Transluce

@TransluceAI

Open and scalable technology for understanding AI systems.

Joined October 2024

10Following

8KFollowers

Transluce Retweeted

Actionable Interpretability Workshop ICML2025@ActInterp · Jul 19

Huge thanks to Sarah Schwettmann for a fascinating keynote on "AI Investigators for Understanding AI Systems" 🤖 @cogconfluence @TransluceAI

4.0K

Transluce@TransluceAI · Jul 17

At #ICML2025? Come chat about investigator agents and model behavior with @ChowdhuryNeil and @_ddjohnson at West Exhibition Hall #1012, now until 1:30pm

TransluceAI's tweet image. At #ICML2025? Come chat about investigator agents and model behavior with @ChowdhuryNeil and @_ddjohnson at West Exhibition Hall #1012, now until 1:30pm

2.0K

Transluce@TransluceAI · Jul 14

We'll be at #ICML2025 🇨🇦 this week! Here are a few places you can find us: Monday: Jacob (@JacobSteinhardt) speaking at Post-AGI Civilizational Equilibria (post-agi.org) Wednesday: Sarah (@cogconfluence) speaking at @WiMLworkshop at 10:15 and as a panelist at 11am…

9.0K

Transluce@TransluceAI · Jul 8

Building a science of model understanding that addresses real-world problems is one of the key AI challenges of our time. I'm so excited this workshop is happening! See you at #ICML2025 ✨

MMor Geva@megamor2 · Jul 8

Going to #icml2025? Don't miss the Actionable Interpretability Workshop (@ActInterp)! We've got an amazing lineup of speakers, panelists, and papers, all focused on leveraging insights from interpretability research to tackle practical, real-world problems ✨

4.0K

Transluce@TransluceAI · Jul 3

Transluce is hosting an #ICML2025 happy hour on Thursday, July 17 in Vancouver. Come meet us and learn more about our work! 🥂 lu.ma/1w854pjn

TransluceAI's tweet card. Transluce is hosting a happy hour at ICML 2025✨ Come meet members of our team and learn more about Transluce's vision and research. There will be drinks,…

8.0K

Transluce@TransluceAI · Jun 14

I'll be interning at @TransluceAI for the summer doing interp too 🫡; exciting to be in SF with this bro.

AAryaman Arora@aryaman2020 · Jun 14

I'll be interning at @TransluceAI for the summer doing interp 🫡 will be staying in SF

16.0K

Transluce Retweeted

Aryaman Arora@aryaman2020 · Jun 14

I'll be interning at @TransluceAI for the summer doing interp 🫡 will be staying in SF

250

38.0K

Transluce@TransluceAI · Jun 5

Ever wondered how likely your AI model is to misbehave? We developed the *propensity lower bound* (PRBO), a variational lower bound on the probability of a model exhibiting a target (misaligned) behavior.

TTransluce@TransluceAI · Jun 5

Is cutting off your finger a good way to fix writer’s block? Qwen-2.5 14B seems to think so! 🩸🩸🩸 We’re sharing an update on our investigator agents, which surface this pathological behavior and more using our new *propensity lower bound* 🔎

7.0K

Transluce@TransluceAI · Apr 21

We're flying to Singapore for #ICLR2025! ✈️ Want to chat with @ChowdhuryNeil, @JacobSteinhardt and @cogconfluence about Transluce? We're also hiring for several roles in research & product. Share your contact info on this form and we'll be in touch 👇 forms.gle/4EHLvYnMfdyrV5…

TransluceAI's tweet image. We're flying to Singapore for #ICLR2025! ✈️

Want to chat with @ChowdhuryNeil, @JacobSteinhardt and @cogconfluence about Transluce? We're also hiring for several roles in research &amp; product.

Share your contact info on this form and we'll be in touch 👇
forms.gle/4EHLvYnMfdyrV5…

7.0K

Transluce@TransluceAI · Apr 18

The problem with RLHF: training an AI to make humans happy will explicitly push it beyond the boundary of human judgment. It teaches the AI how to lie to us. @mengk20 @cogconfluence's docent surfaces this as rampant prevarication. Interpretability is the key problem in AI.

TTransluce@TransluceAI · Apr 16

We tested a pre-release version of o3 and found that it frequently fabricates actions it never took, and then elaborately justifies these actions when confronted. We were surprised, so we dug deeper 🔎🧵(1/) x.com/OpenAI/status/…

6.0K