Itay Yona

@itay__yona

Joined November 2021

262Following

116Followers

Itay Yona Retweeted

Tomer Ashuach@tomerashuach · May 27

🚨New paper at #ACL2025 Findings! REVS: Unlearning Sensitive Information in LMs via Rank Editing in the Vocabulary Space. LMs memorize and leak sensitive data—emails, SSNs, URLs from their training. We propose a surgical method to unlearn it. 🧵👇w/@boknilev @mtutek 1/8

4.0K

Itay Yona Retweeted

Shahaf Bassan@shahaf_bassan · Apr 21

🧠 New #ICLR2025 paper: "Explain Yourself, Briefly!" We introduce Sufficient Subset Training (SST)—a self-supervised method enabling neural networks to generate concise, faithful explanations as part of their predictions. 📷 Read more: arxiv.org/abs/2502.03391

1.0K

Itay Yona Retweeted

Mike Erlihson, Math PhD, AI@MikeE_3_14 · Nov 12

⚡️🚀המאמר היומי של מייק -11.11.24: ⚡️🚀 Stealing Part of a Production Language Model 1️⃣ מזמן (מאוד כי לא מצליח להיזכר אפילו בערך) לא סקרתי מאמר על איזה ניתן לפרוץ למודלים עמוקים. יש תחום שלם שנקרא adversarial learning שבו חוקרים מפתחים מנגנוני הגנה נגד התקפות שמנסות לגנוב משהו…

2.0K

Itay Yona Retweeted

Ilia Shumailov🦔@iliaishacked · Nov 4

📢 New security risk for Mixture-of-Experts (MoE)! 📢 @GoogleDeepMind research reveals a new kind of vulnerability that could leak user prompts in MoE models. Our "MoE Tiebreak Leakage" attack exploits the Expert Choice Routing strategy. arxiv.org/pdf/2410.22884

203

147

27.0K

Itay Yona Retweeted

Andreas Terzis@aterzis · Jul 24, 2024

We have been awarded two Best Paper Awards at @icmlconf 2024 for 'Stealing Part of a Production Language Model' and 'Considerations for Differentially Private Learning with Large-Scale Public Pretraining'! arxiv.org/abs/2403.06634 arxiv.org/abs/2212.06470

11.0K

Itay Yona Retweeted

Deedy@deedydas · Jun 15, 2024

It's finally here. Q* rings true. Tiny LLMs are as good at math as a frontier model. By using the same techniques Google used to solve Go (MTCS and backprop), Llama8B gets 96.7% on math benchmark GSM8K! That’s better than GPT-4, Claude and Gemini, with 200x less parameters!

312

3.0K

2.0K

802.0K

Itay Yona Retweeted

Neel Nanda@NeelNanda5 · Dec 26, 2022

I'm very interested in learning about other fields that reverse engineer complex systems, & seeing what lessons might transfer to neural networks. I had a great call with Itay Yona, a software/hardware reverse engineering expert, here are some notes: neelnanda.io/mechanistic-in…

190

40.0K