EdinburghNLP
@EdinburghNLP
The Natural Language Processing Group at the University of Edinburgh. BFF with @Imperial_NLP
Join our PhD programme in Designing Responsible Natural Language Processing at the UKRI AI Centre for Doctoral Training, University of Edinburgh. Applications are now re-opened for Home fee status candidates (past candidates need not re-apply). responsiblenlp.org
🚨 New AI Threat Alert Multilingual LLMs can secretly transfer backdoors from one language to many others. Spanish in, Chinese out—maliciously. Come see how at our poster: 🗓 Today (07/28), 18:00–19:30 📍 Hall 4/5 #ACL2025 #AIsecurity #LLMsafety
🚨 New Paper! (arxiv.org/abs/2404.19597)🚨 We uncover significant vulnerabilities in Multilingual LLMs (MLLMs) (e.g., BLOOM, Llama2, Llama3, Gemma, and GPT-3.5-turbo) to cross-lingual transferable backdoor attacks. #AIsafety #LLMs #backdoors
Inverse Scaling in Test-Time Compute "We identify five distinct failure modes when models reason for longer: 1) Claude models become increasingly distracted by irrelevant information; 2) OpenAI o-series models resist distractors but overfit to problem framings; 3) models shift…
Anthropic just released a research paper. Inverse Scaling in Test-Time Compute This study shows that longer reasoning in Large Reasoning Models (LRMs) can hurt performance—revealing a surprising inverse scaling between reasoning length and accuracy. According to this paper,…
#Anthropic’s new paper on inverse scaling at test time is a must-read! 👏 @aryopg @PMinervini @yanda_chen_ @EthanJPerez In our recent work, we found a twist (on math task): it’s not inverse: performance goes up ⬆️ then down ⬇️. Parallel thinking might fix it. Curious? Link 👇
Recent paper by #Anthropic @aryopg @PMinervini @yanda_chen_ @EthanJPerez Inverse Scaling in Test-Time Compute : arxiv.org/abs/2507.14417 Validates our existing findings in the work published last month: Does test-time scaling always help? x.com/SOURADIPCHAKR1…
*The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs* by @p_nawrot @PontiEdoardo @cheeesio @seb_ruder They study sparse attention techniques at scale, comparing to small dense models at the same compute budget. arxiv.org/abs/2504.17768
The amazing folks at @EdinburghNLP will be presenting a few papers at ACL 2025 (@aclmeeting); if you're in Vienna, touch base with them! Here are the papers in the main track 🧵
'Theorem Prover as a Judge for Sythetic Data Generation' has been accepted to ACL (Main) 🚀. Do check us out at July 30th (Wednesday) 11:00- 12:30pm at Hall 4/5! A huge thank you to my amazing collaborators: Shay @GiwonHong413849 @WendaLi8 📝: aclanthology.org/2025.acl-long.…
New Anthropic Research: “Inverse Scaling in Test-Time Compute” We found cases where longer reasoning leads to lower accuracy. Our findings suggest that naïve scaling of test-time compute may inadvertently reinforce problematic reasoning patterns. 🧵
Why do AI assistants feel so generic? Our new #ACL2025 paper, PersonaLens🔎, tackles this head-on. We built a new benchmark to test personalization in ways that matter. I'll be presenting our work at the poster session in Vienna next week! 🧵[1/4]
🏆 Our @nvidia KV Cache Compression Leaderboard is now live! Compare state-of-the-art compression methods side-by-side with KVPress. See which techniques are leading in efficiency and performance. 🥇 huggingface.co/spaces/nvidia/…
Many thanks to the @ActInterp organisers for highlighting our work - and congratulations to Pedro, Alex and the other awardees! Sad not to have been there in person, it looked like a fantastic workshop. @AmsterdamNLP @EdinburghNLP
Big congrats to Alex McKenzie, Pedro Ferreira, and their collaborators on receiving Outstanding Paper Awards!👏👏 and thanks for the fantastic oral presentations! Check out the papers here 👇
I'll be hiring a couple of Ph.D. students at CMU (via LTI or MLD) in the upcoming cycle! If you are interested in joining my group, please read the FAQ before reaching out to me via email :) docs.google.com/document/d/12V…
Are you interested in the intersection of Mathematics and NLP? Consider submitting your paper to #MathNLP 2025: The 3rd Workshop on Mathematical NLP. #EMNLP2025. Submissions will open on June 25! Take a look here for more details sites.google.com/view/mathnlp20…
Transformers struggle with length generalization and long context. What can we do about it? Our new #TMLR paper with @rolandalong , @paul_smolensky and @JianfengGao0217 shows how to handle the issue. Using a new attention mechanism called TRA. Curious? Read the 🧵 for more 🤓
Thanks to everyone who stopped by our work! If you missed it and want to know more, just drop me a message! #ICML2025
Spotlight poster coming soon at #ICML2025 @icmlconf! 📌East Exhibition Hall A-B E-1806 🗓️Wed 16 Jul 4:30 p.m. PDT — 7 p.m. PDT 📜arxiv.org/pdf/2410.12537 Let’s chat! I’m always up for conversations about knowledge graphs, reasoning, neuro-symbolic AI, and benchmarking.
We blend imitation (SFT) and exploration (RLVR) in post-training with a simple idea: Sample a prefix of an SFT demonstration, let your policy model complete it, and mix it with other RLVR rollouts Intuitively, the model relies more on hints for problems currently out of reach
🚀 Introducing Prefix-RFT to blend SFT and RFT! SFT can learn more complex problems by mimicking, but can have poor generalization. RFT has better overall performance but is limited by the initial policy. Our method, Prefix-RFT, makes the best of both worlds!
I hope somebody mentioned pixel-based models: arxiv.org/abs/2401.03321 @tetraduzione
most controversial statement so far from @alisawuffles: "tokenization research is not as cool" **very vocals disagreements from crowd of tokenization nerds**
🚀 Introducing Prefix-RFT to blend SFT and RFT! SFT can learn more complex problems by mimicking, but can have poor generalization. RFT has better overall performance but is limited by the initial policy. Our method, Prefix-RFT, makes the best of both worlds!
🚨New paper alert!🚨 "Scalpel vs. Hammer: GRPO Amplifies Existing Capabilities, SFT Replaces Them" @ActInterp ICML'25 @deepseek_ai popularised RLVR and distillation for 'reasoning training'! But how do they differ under the hood? Details in 🧵: (1/8)
Spotlight poster coming soon at #ICML2025 @icmlconf! 📌East Exhibition Hall A-B E-1806 🗓️Wed 16 Jul 4:30 p.m. PDT — 7 p.m. PDT 📜arxiv.org/pdf/2410.12537 Let’s chat! I’m always up for conversations about knowledge graphs, reasoning, neuro-symbolic AI, and benchmarking.
🚨Is complex query answering really complex?🚨 unfortunately not! the current benchmarks boil down to link prediction 98% of the time... how to fix this??? 👇👇👇 📜arxiv.org/abs/2410.12537 with @c_gregucci @BoXiongs @loreloc_ @PMinervini @ststaab