Itay Itzhak

@Itay_itzhak_

NLProc, deep learning, and machine learning. Ph.D. student @TechnionLive and @HebrewU

Joined November 2019

237Following

308Followers

Pinned

Itay Itzhak@Itay_itzhak_ · Jul 15

🚨New paper alert🚨 🧠 Instruction-tuned LLMs show amplified cognitive biases — but are these new behaviors, or pretraining ghosts resurfacing? Excited to share our new paper, accepted to CoLM 2025🎉! See thread below 👇 #BiasInAI #LLMs #MachineLearning #NLProc

Itay_itzhak_'s tweet image. 🚨New paper alert🚨

🧠
Instruction-tuned LLMs show amplified cognitive biases — but are these new behaviors, or pretraining ghosts resurfacing?

Excited to share our new paper, accepted to CoLM 2025🎉!
See thread below 👇
#BiasInAI #LLMs #MachineLearning #NLProc

3.0K

Pinned

Itay Itzhak@Itay_itzhak_ · Jun 13

Coming soon - an LLM in your group chat

GGabriel Stanovsky@GabiStanovsky · Jun 12

Check out @niveckhaus 's excellent work, developing a model capable of playing human players in asynchronous settings, deciding when to intervene or when to stay quiet 🤐

115

Pinned

Itay Itzhak@Itay_itzhak_ · Jun 12

Check out @niveckhaus 's excellent work, developing a model capable of playing human players in asynchronous settings, deciding when to intervene or when to stay quiet 🤐

NNiv Eckhaus@niveckhaus · Jun 12

🚨 New Paper: "Time to Talk"! 🕵️ We built an LLM agent that doesn't just decide WHAT to say, but also WHEN to say it! Introducing "Time to Talk" - LLM agents for asynchronous group communication, tested in real Mafia games with human players. 🌐niveck.github.io/Time-to-Talk 🧵1/7

486

Itay Itzhak Retweeted

Tal Haklay ✈️ACL@tal_haklay · Jul 10

🚨Meet our panelists at the Actionable Interpretability Workshop @ActInterp at @icmlconf! Join us July 19 at 4pm for a panel on making interpretability research actionable, its challenges, and how the community can drive greater impact. @nsaphra @saprmarks @kylelostat @FazlBarez

8.0K

Itay Itzhak@Itay_itzhak_ · Jul 9

Now accepted to #COLM2025! We formally define hidden knowledge in LLMs and show its existence in a controlled study. We even show that a model can know the answer yet fail to generate it in 1,000 attempts 😵 Looking forward to presenting and discussing our work in person.

ZZorik Gekhman@zorikgekhman · Mar 31

🚨 It's often claimed that LLMs know more facts than they show in their outputs, but what does this actually mean, and how can we measure this “hidden knowledge”? In our new paper, we clearly define this concept and design controlled experiments to test it. 1/🧵

2.0K

Itay Itzhak@Itay_itzhak_ · Jul 3

This needed to be said!

FFazl Barez @ICML2025@FazlBarez · Jul 1

Excited to share our paper: "Chain-of-Thought Is Not Explainability"! We unpack a critical misconception in AI: models explaining their Chain-of-Thought (CoT) steps aren't necessarily revealing their true reasoning. Spoiler: transparency of CoT can be an illusion. (1/9) 🧵

576

Itay Itzhak@Itay_itzhak_ · Jun 16

🕊️ DOVE is a living benchmark! Just pushed major updates: 📊 Dataset expansion: Added ~5700 MMLU examples with Llama-70B - each tested across 100 different prompt variations = 570K new predictions! 📈 Website upgrades: New interactive plots throughout- slab-nlp.github.io/DOVE/

EEliya Habba @ ACL 2025 🇦🇹@EliyaHabba · Mar 17

Care about LLM evaluation? 🤖 🤔 We bring you🕊️ DOVE a massive (250M!) collection of LLMs outputs On different prompts, domains, tokens, models... Join our community effort to expand it with YOUR model predictions & become a co-author!

894

Itay Itzhak Retweeted

Yaniv Nikankin@YNikankin · Jun 11

VLMs perform better when answering questions about text than when answering the same questions about images - but why? and how can we fix it? We investigate this gap from a mechanistic interpretability perspective, and use our findings to close a third of it! 🧵

148

122

10.0K

Itay Itzhak Retweeted

Nitay Calderon@NitCal · Jun 4

Preferences drive modern LLM research and development: from model alignment to evaluation. But how well do we understand them? Excited to share our new preprint: Multi-domain Explainability of Preferences arxiv.org/abs/2505.20088 @roireichart @LiatEinDor 🧵👇 1/11

2.0K

Itay Itzhak Retweeted

Dana Arad 🎗️@dana_arad4 · May 27

Tried steering with SAEs and found that not all features behave as expected? Check out our new preprint - "SAEs Are Good for Steering - If You Select the Right Features" 🧵

168

14.0K

Itay Itzhak@Itay_itzhak_ · May 27

Y'all are wasting compute on reasoning tokens models don't need. Check out this cool new paper by @MichaelHassid!

MMichael Hassid@MichaelHassid · May 27

The longer reasoning LLM thinks - the more likely to be correct, right? Apparently not. Presenting our paper: “Don’t Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning”. Link: arxiv.org/abs/2505.17813 1/n

156

Itay Itzhak Retweeted

Yonatan Belinkov@boknilev · May 15

BlackboxNLP will be co-located with #EMNLP2025 in Suzhou this November! 📷This edition will feature a new shared task on circuits/causal variable localization in LMs, details: blackboxnlp.github.io/2025/task If you're into mech interp and care about evaluation, please submit!

10.0K

Itay Itzhak Retweeted

Tomer Ashuach@tomerashuach · May 27

🚨New paper at #ACL2025 Findings! REVS: Unlearning Sensitive Information in LMs via Rank Editing in the Vocabulary Space. LMs memorize and leak sensitive data—emails, SSNs, URLs from their training. We propose a surgical method to unlearn it. 🧵👇w/@boknilev @mtutek 1/8

4.0K

Itay Itzhak Retweeted

Tal Haklay ✈️ACL@tal_haklay · May 19

Our paper "Position-Aware Circuit Discovery" got accepted to ACL! 🎉 Huge thanks to my collaborators🙏 @OrgadHadas @davidbau @amuuueller @boknilev See you in Vienna! 🇦🇹 #ACL2025 @aclmeeting

186

14.0K