Itay Yona
@itay__yona
🚨New paper at #ACL2025 Findings! REVS: Unlearning Sensitive Information in LMs via Rank Editing in the Vocabulary Space. LMs memorize and leak sensitive data—emails, SSNs, URLs from their training. We propose a surgical method to unlearn it. 🧵👇w/@boknilev @mtutek 1/8
🧠 New #ICLR2025 paper: "Explain Yourself, Briefly!" We introduce Sufficient Subset Training (SST)—a self-supervised method enabling neural networks to generate concise, faithful explanations as part of their predictions. 📷 Read more: arxiv.org/abs/2502.03391
⚡️🚀המאמר היומי של מייק -11.11.24: ⚡️🚀 Stealing Part of a Production Language Model 1️⃣ מזמן (מאוד כי לא מצליח להיזכר אפילו בערך) לא סקרתי מאמר על איזה ניתן לפרוץ למודלים עמוקים. יש תחום שלם שנקרא adversarial learning שבו חוקרים מפתחים מנגנוני הגנה נגד התקפות שמנסות לגנוב משהו…
📢 New security risk for Mixture-of-Experts (MoE)! 📢 @GoogleDeepMind research reveals a new kind of vulnerability that could leak user prompts in MoE models. Our "MoE Tiebreak Leakage" attack exploits the Expert Choice Routing strategy. arxiv.org/pdf/2410.22884
We have been awarded two Best Paper Awards at @icmlconf 2024 for 'Stealing Part of a Production Language Model' and 'Considerations for Differentially Private Learning with Large-Scale Public Pretraining'! arxiv.org/abs/2403.06634 arxiv.org/abs/2212.06470
It's finally here. Q* rings true. Tiny LLMs are as good at math as a frontier model. By using the same techniques Google used to solve Go (MTCS and backprop), Llama8B gets 96.7% on math benchmark GSM8K! That’s better than GPT-4, Claude and Gemini, with 200x less parameters!
I'm very interested in learning about other fields that reverse engineer complex systems, & seeing what lessons might transfer to neural networks. I had a great call with Itay Yona, a software/hardware reverse engineering expert, here are some notes: neelnanda.io/mechanistic-in…