Mor Geva (@megamor2)

Mor Geva Retweeted

B

Bethlehem Tekola, PhD@Bethlehemtekola · Jul 19

"writing is not only about reporting results; it also provides a tool to uncover new thoughts and ideas. Writing compels us to think"

11

551

3.0K

1.0K

115.0K

Mor Geva Retweeted

A

Aryaman Arora@aryaman2020 · Jul 19

maybe I will live tweet the actionable interp workshop panel

11

8

101

30

12.0K

M

Mor Geva@megamor2 · Jul 19

Starting soon!! #ICML2025

AActionable Interpretability Workshop ICML2025@ActInterp · Jul 18

🚨The Actionable Interpretability Workshop is happening tomorrow at ICML! Join us for an exciting lineup of speakers, nearly 70 posters, and a great panel discussion 🙌 Don’t miss it! 🔍⚙️ @icmlconf @ActInterp

0

9

0

681

Mor Geva Retweeted

O

Oded Rechavi@OdedRechavi · Jul 18

When the reviewers' identities are revealed

14

46

645

34

44.0K

M

Mor Geva@megamor2 · Jul 16

Suppose you're reading something (let it be a paper, review, email, whatever) and as you read you get the sense it was largely written by an LLM. What's your first reaction? How does it change how you read it? EDIT: by "largely written by an LLM" I mean the writer heavily used…

13

1

24

3

3.0K

Mor Geva Retweeted

T

Tal Haklay ✈️ACL@tal_haklay · Jul 10

🚨Meet our panelists at the Actionable Interpretability Workshop @ActInterp at @icmlconf! Join us July 19 at 4pm for a panel on making interpretability research actionable, its challenges, and how the community can drive greater impact. @nsaphra @saprmarks @kylelostat @FazlBarez

0

14

57

9

8.0K

M

Mor Geva@megamor2 · Jul 8

Building a science of model understanding that addresses real-world problems is one of the key AI challenges of our time. I'm so excited this workshop is happening! See you at #ICML2025 ✨

MMor Geva@megamor2 · Jul 8

Going to #icml2025? Don't miss the Actionable Interpretability Workshop (@ActInterp)! We've got an amazing lineup of speakers, panelists, and papers, all focused on leveraging insights from interpretability research to tackle practical, real-world problems ✨

0

4

37

4

4.0K

M

Mor Geva@megamor2 · Jul 8

Gonna be there /w the g.o.a.t. @Pranav_AL, can't wait for it! Thank you so much for the workshop 🚀!

MMor Geva@megamor2 · Jul 8

Going to #icml2025? Don't miss the Actionable Interpretability Workshop (@ActInterp)! We've got an amazing lineup of speakers, panelists, and papers, all focused on leveraging insights from interpretability research to tackle practical, real-world problems ✨

0

1

10

0

661

M

Mor Geva@megamor2 · Jul 8

Going to #icml2025? Don't miss the Actionable Interpretability Workshop (@ActInterp)! We've got an amazing lineup of speakers, panelists, and papers, all focused on leveraging insights from interpretability research to tackle practical, real-world problems ✨

megamor2's tweet image. Going to #icml2025? Don't miss the Actionable Interpretability Workshop (@ActInterp)! We've got an amazing lineup of speakers, panelists, and papers, all focused on leveraging insights from interpretability research to tackle practical, real-world problems ✨

1

6

44

3

7.0K

Mor Geva Retweeted

I

Ido Cohen@IdoC0hen · Jul 6

A Vision-Language Model can answer questions about Robin Williams. It can also recognize him in a photo. So why does it FAIL when asked the same questions using his photo instead of his name? A thread on our new #acl2025 paper that explores this puzzle 🧵

1

7

25

9

2.0K

M

Mor Geva@megamor2 · Jun 18

What makes some jailbreak suffixes stronger than others? We looked into the inner workings of GCG-like attacks and found a cool hijacking mechanism that strong attacks heavily rely on. This also lets us enhance attacks and defenses against them. Check out @matanbt 's thread 👇

MMatan Ben-Tov@matanbt · Jun 18

What makes or breaks powerful jailbreak suffixes? 🔓🤖 We find that: 🥷 they work by hijacking the model’s context; ♾️ the more universal a suffix is the stronger its hijacking; ⚔️🛡️ utilizing these insights, it is possible to both enhance and mitigate these attacks. 🧵

0

1

11

3

1.0K

Mor Geva Retweeted

M

Matan Ben-Tov@matanbt · Jun 18

What makes or breaks powerful jailbreak suffixes? 🔓🤖 We find that: 🥷 they work by hijacking the model’s context; ♾️ the more universal a suffix is the stronger its hijacking; ⚔️🛡️ utilizing these insights, it is possible to both enhance and mitigate these attacks. 🧵

2

12

43

23

5.0K

M

Mor Geva@megamor2 · Jun 18

New personal record: slept through the sirens 💤

1

0

14

0

978

M

Mor Geva@megamor2 · Jun 13

🇮🇱🇮🇱🇮🇱

0

21

0

1.0K

Mor Geva Retweeted

S

Sohee Yang@soheeyang_ · Jun 13

🚨 New Paper 🧵 How effectively do reasoning models reevaluate their thought? We find that: - Models excel at identifying unhelpful thoughts but struggle to recover from them - Smaller models can be more robust - Self-reevaluation ability is far from true meta-cognitive awareness

3

26

116

55

7.0K