Lisa Alazraki
@LisaAlazraki
#ML & #NLProc PhD student @ImperialCollege. Research Scientist Intern @AIatMeta prev. @Cohere, @GoogleAI. Interested in reasoning, planning, robustness. She/her
Thrilled to share our new preprint on Reinforcement Learning for Reverse Engineering (RLRE) 🚀 We demonstrate that human preferences can be reverse engineered effectively by pipelining LLMs to optimise upstream preambles via reinforcement learning 🧵⬇️

I’ll be at #acl2025 next week in Vienna🇦🇹 presenting our work. Looking forward to meeting old and new friends! If you like to discuss chatbot, code generation, multi-agents, robustness, hate speech detection, or just life😆, feel free to reach out and I’d love to chat!
🙌Happy to share our paper, “DiffuseDef: Improved Robustness to Adversarial Attacks via Iterative Denoising” is accepted to #ACL2025! Great thanks to my co-authors Huichi Zhou, @MarekRei, @lspecia! Arxiv: arxiv.org/abs/2407.00248 GitHub: github.com/Nickeilf/Diffu…
ACL is almost here 🎉 Our Imperial NLP community will be presenting several papers at the conference next week. We look forward to seeing everyone in Vienna!
If you are attending ICML 2025 and interested in Causal Discovery, I will be presenting our work today at 11am! See you at the East Exibition Hall - Poster 1303. #icml #icml2025
Excited to share that our paper "Causal discovery from Conditionally Stationary Time Series" has been accepted to ICML 2025!🥳 Pre-print: arxiv.org/abs/2110.06257 Thank you very much to all my collaborators, persistence pays off! #icml #icml2025
A new paper with @aliceiannntn and @MatteoCasiragh3 analyzing Italian tweets on feminism, tracking toxicity and shifting connotations over time is being presented today at the LLM for political science workshop at @europsa 📉📈
LLMs can be programmed by backprop 🔎 In our new preprint, we show they can act as fuzzy program interpreters and databases. After being ‘programmed’ with next-token prediction, they can retrieve, evaluate, and even *compose* programs at test time, without seeing I/O examples.
🥳Thrilled to announce that I started as an Assistant Professor @imperialcollege! Grateful to all my mentors and collaborators from @ucl, @UniofOxford, and worldwide! I will continue working on 🌟Alignment and A(G)I Safety🌟 and 🚨 I have funding for PhD students 🚨. I look…
(1/9) LLMs can regurgitate memorized training data when prompted adversarially. But what if you *only* have access to synthetic data generated by an LLM? In our @icmlconf paper, we audit how much information synthetic data leaks about its private training data 🐦🌬️
At @RLDMDublin2025 this week to present our work on incorporating diverse prior knowledge in RL (sample efficiency, safety, interpretability,...) Poster #94 on Thursday Full paper here: arxiv.org/abs/2306.01158 #RLDM2025
How good can privacy attacks against LLM pretraining get if you assume a very strong attacker? Check it out in our preprint ⬇️
Are modern large language models (LLMs) vulnerable to privacy attacks that can determine if given data was used for training? Models and dataset are quite large, what should we even expect? Our new paper looks into this exact question. 🧵 (1/10)
🚨 New preprint alert – accepted at #ACL2025 Findings!🚨
If anyone has done any work improving the robustness of NLI models and we didn't cite you in our appendix, please share a link to your work - I would love to include it Our appendix related work on NLI robustness is a bit of a monster 😅🧌 x.com/_joestacey_/st…
This work was really fun and a great last paper for my PhD. Check it out 🙂 arxiv.org/abs/2505.20209 P.S. if you know about a paper improving NLI model robustness not already in our related work appendix, I would love to hear about it🥰
If you're interested in NLI or model robustness, this is a nice and fun paper - check it out 🙂 x.com/_joestacey_/st…
We have a new paper up on arXiv! 🥳🪇 The paper tries to improve the robustness of closed-source LLMs fine-tuned on NLI, assuming a realistic training budget of 10k training examples. Here's a 60 second rundown of what we found!
We have a new paper up on arXiv! 🥳🪇 The paper tries to improve the robustness of closed-source LLMs fine-tuned on NLI, assuming a realistic training budget of 10k training examples. Here's a 60 second rundown of what we found!
🙌Happy to share our paper, “DiffuseDef: Improved Robustness to Adversarial Attacks via Iterative Denoising” is accepted to #ACL2025! Great thanks to my co-authors Huichi Zhou, @MarekRei, @lspecia! Arxiv: arxiv.org/abs/2407.00248 GitHub: github.com/Nickeilf/Diffu…
Can LLMs be incentivised to generate token sequences (in this case preambles) that condition downstream models to improve performance when judged by reward models? Yes! ✅
Thrilled to share our new preprint on Reinforcement Learning for Reverse Engineering (RLRE) 🚀 We demonstrate that human preferences can be reverse engineered effectively by pipelining LLMs to optimise upstream preambles via reinforcement learning 🧵⬇️