José Maria Pombal

@zmprcp

Senior Research Scientist @swordhealth, PhD student @istecnico.

Lisbon, Portugal

Joined March 2023

114Following

88Followers

Pinned

José Maria Pombal@zmprcp · Mar 19

Our pick of the week by @apierg: "Adding Chocolate to Mint: Mitigating Metric Interference in Machine Translation" by José Pombal, Nuno M. Guerreiro, @RicardoRei7, and @andre_t_martins (2025). #mt #translation #metric #machinetranslation

AAndrea Piergentili@apierg · Mar 19

Brilliant and necessary work by @zmprcp et al. about metric interference in MT system development and evaluation: arxiv.org/abs/2503.08327 Are we developing better systems or are we just gaming the metrics? And how do we address this? Super (m)interesting! 👀

520

José Maria Pombal@zmprcp · Jul 14

Last week was my final one at @Unbabel. I'm incredibly proud of our work (e.g., Tower, MINT, M-Prometheus, ZSB). Now, alongside my PhD studies at @istecnico, I'm joining @swordhealth as Senior Research Scientist under @RicardoRei7. Super confident in the team we're assembling.

431

José Maria Pombal Retweeted

Manos Zaranis@ManosZaranis · Jun 23

🚨Meet MF²: Movie Facts & Fibs: a new benchmark for long-movie understanding! 🤔Do you think your model understands movies? Unlike existing benchmarks, MF² targets memorable events, emotional arcs 💔, and causal chains 🔗 — things humans recall easily, but even top models like…

9.0K

José Maria Pombal@zmprcp · Jun 23

Check out the latest iteration of Tower models, Tower+. Ideal for translation tasks and beyond, and available at three different scales: 2B, 9B, 72B. All available on huggingface: huggingface.co/collections/Un… Kudos to everyone involved!

RRicardo Rei@RicardoRei7 · Jun 23

🚀 Tower+: our latest model in the Tower family — sets a new standard for open-weight multilingual models! We show how to go beyond sentence-level translation, striking a balance between translation quality and general multilingual capabilities. 1/5 arxiv.org/pdf/2506.17080

462

José Maria Pombal Retweeted

Dongkeun Yoon@dongkeun_yoon · May 21

🙁 LLMs are overconfident even when they are dead wrong. 🧐 What about reasoning models? Can they actually tell us “My answer is only 60% likely to be correct”? ❗Our paper suggests that they can! Through extensive analysis, we investigate what enables this emergent ability.

299

266

31.0K

José Maria Pombal Retweeted

Patrick Fernandes@psanfernandes · May 16

MT metrics excel at evaluating sentence translations, but struggle with complex texts We introduce *TREQA* a framework to assess how translations preserve key info by using LLMs to generate & answer questions about them arxiv.org/abs/2504.07583 (co-lead @swetaagrawal20) 1/15

5.0K

José Maria Pombal@zmprcp · Apr 8

Introducing M-Prometheus — the latest iteration of the open LLM judge, Prometheus! Specially trained for multilingual evaluation. Excels across diverse settings, including the challenging task of literary translation assessment.

JJosé Maria Pombal@zmprcp · Apr 8

We just released M-Prometheus, a suite of strong open multilingual LLM judges at 3B, 7B, and 14B parameters! Check out the models and training data on Huggingface: huggingface.co/collections/Un… and our paper: arxiv.org/abs/2504.04953

872

José Maria Pombal@zmprcp · Apr 8

Here's our new paper on m-Prometheus, a series of multulingual judges! 1/ Effective at safety & translation eval 2/ Also stands out as a good reward model in BoN 3/ Backbone model selection & training on natively multilingual data is important Check out @zmprcp 's post!

JJosé Maria Pombal@zmprcp · Apr 8

1.0K

José Maria Pombal Retweeted

Slator@slatornews · Mar 20

.@Unbabel exposes 🔎 how using the same metrics for both training and evaluation can create misleading ⚠️ #machinetranslation performance estimates and proposes how to solve this with MINTADJUST. @zmprcp @RicardoRei7 @andre_t_martins #translation #xl8 #MT slator.ch/UnbabelBiasAIT…

418