Edoardo Ponti

@PontiEdoardo

Assistant Professor in #NLP at @EdinburghUni | A Kleene star shines on the hour of our meeting

Edinburgh

Joined August 2018

470Following

3KFollowers

Edoardo Ponti@PontiEdoardo · Jul 27

Reach out to @yifuqiu98 if you’re looking for a research scientist starting next year! He is extremely talented and he’s been doing fantastic research on world models inside general-purpose LLMs/VLMs

YYifu Qiu@ACL2025 🇦🇹@yifuqiu98 · Jul 27

Most importantly, I will be in job market for 2026. If you have any research positions about language grounding (world model), hallucinations, safety for foundation models. Let’s discuss about it!

583

Edoardo Ponti Retweeted

Pasquale Minervini@PMinervini · Jul 25

The amazing folks at @EdinburghNLP will be presenting a few papers at ACL 2025 (@aclmeeting); if you're in Vienna, touch base with them! Here are the papers in the main track 🧵

7.0K

Edoardo Ponti Retweeted

Simone Scardapane@s_scardapane · Jul 25

*The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs* by @p_nawrot @PontiEdoardo @cheeesio @seb_ruder They study sparse attention techniques at scale, comparing to small dense models at the same compute budget. arxiv.org/abs/2504.17768

178

117

9.0K

Edoardo Ponti@PontiEdoardo · Jul 18

We blend imitation (SFT) and exploration (RLVR) in post-training with a simple idea: Sample a prefix of an SFT demonstration, let your policy model complete it, and mix it with other RLVR rollouts Intuitively, the model relies more on hints for problems currently out of reach

ZZeyu Huang@ZeroyuHuang · Jul 18

🚀 Introducing Prefix-RFT to blend SFT and RFT! SFT can learn more complex problems by mimicking, but can have poor generalization. RFT has better overall performance but is limited by the initial policy. Our method, Prefix-RFT, makes the best of both worlds!

4.0K

Edoardo Ponti@PontiEdoardo · Jul 15

If you are at @icmlconf make sure to attend @AdrianLancucki’s invited talk on our inference-time *hyper*-scaling paper (and more!) at the tokenization workshop this Friday tokenization-workshop.github.io/schedule/

EEdoardo Ponti@PontiEdoardo · Jun 6

🚀 By *learning* to compress the KV cache in Transformer LLMs, we can generate more tokens for the same compute budget. This unlocks *inference-time hyper-scaling* For the same runtime or memory load, we can boost LLM accuracy by pushing reasoning even further!

1.0K

Edoardo Ponti@PontiEdoardo · Jul 14

Thanks for acknowledging Dynamic Token Pooling as a predecessor to H-Net, @_albertgu! We had some decent ideas in that paper (e2e and entropy-based tokenisation), but it surprises me that it took 2 years (an eternity in NLP) to find the right recipe and scale better than BPE

AAlbert Gu@_albertgu · Jul 11

Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence. Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better.

6.0K

Edoardo Ponti Retweeted

Cardiff NLP@Cardiff_NLP · Jul 13

The 4th Cardiff #NLProc Summer Workshop starts tomorrow! We'll have two full days of insightful talks, hands-on sessions, and networking. 📅 Check out the full schedule here: cardiffnlpworkshop.org/schedule

5.0K

Edoardo Ponti@PontiEdoardo · Jul 2

I thoroughly enjoyed reading @vernadankers's dissertation; my personal highlight was her idea of maps that track the training memorisation versus test generalisation of each example. I wish you all the best for the upcoming postdoc with @sivareddyg and his wonderful group!

VVerna Dankers@vernadankers · Jul 1

I miss Edinburgh and its wonderful people already!! Thanks to @tallinzen and @PontiEdoardo for inspiring discussions during the viva! I'm now exchanging Arthur's Seat for Mont Royal to join @sivareddyg's wonderful lab @Mila_Quebec 🤩

1.0K

Edoardo Ponti@PontiEdoardo · Jun 25

Test-time scaling is all over the place right now, here we try to pack knowledge of particular documents in LoRAs (in "knowledge modules"), by performing expensive computation offline, such that test time computation is quick; i see this as precomputing and storing possible…

LLucas Caccia@LucasPCaccia · Jun 25

RAG and in-context learning are the go-to approaches for integrating new knowledge into LLMs, making inference very inefficient We propose instead 𝗞𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 𝗠𝗼𝗱𝘂𝗹𝗲𝘀 : lightweight LoRA modules trained offline that can match RAG performance without the drawbacks

2.0K

Edoardo Ponti Retweeted

Lucas Caccia@LucasPCaccia · Jun 25

4.0K

Edoardo Ponti Retweeted

Piotr Nawrot@p_nawrot · Jun 18

We built sparse-frontier — a clean abstraction that lets you focus on your custom sparse attention implementation while automatically inheriting vLLM’s optimizations and model support. As a PhD student, I've learned that sometimes the bottleneck in research isn't ideas — it's…

319

217

40.0K

Edoardo Ponti Retweeted

Marktechpost AI Dev News ⚡@Marktechpost · Jun 11

NVIDIA Researchers Introduce Dynamic Memory Sparsification (DMS) for 8× KV Cache Compression in Transformer LLMs As the demand for reasoning-heavy tasks grows, large language models (LLMs) are increasingly expected to generate longer sequences or parallel chains of reasoning.…

2.0K

Edoardo Ponti@PontiEdoardo · Jun 13

Cool use of our AURORA work from last year to improve physical world models framed as image editing!

YYifu Qiu@ACL2025 🇦🇹@yifuqiu98 · Jun 10

🔁 What if you could bootstrap a world model (state1 × action → state2) using a much easier-to-train dynamics model (state1 × state2 → action) in a generalist VLM? 💡 We show how a dynamics model can generate synthetic trajectories & serve for inference-time verification 🧵👇

744

Edoardo Ponti Retweeted

fly51fly@fly51fly · Mar 13

[LG] Training Plug-n-Play Knowledge Modules with Deep Context Distillation L Caccia, A Ansell, E Ponti, I Vulić... [Microsoft Research Montreal & University of Cambridge] (2025) arxiv.org/abs/2503.08727

1.0K