Dan Deutsch

@_danieldeutsch

Research Scientist at Google Translate working on text generation evaluation

San Francisco

Joined September 2012

89Following

610Followers

Pinned

Dan Deutsch@_danieldeutsch · Dec 10, 2023

Excited to receive an Outstanding Paper award for this work at @emnlpmeeting! Thanks to my co-authors George Foster and @markuseful! Updated version available here: aclanthology.org/2023.emnlp-mai…

DDan Deutsch@_danieldeutsch · May 24, 2023

LLM-based metrics like GEMBA predict many ties, but the way that ties should be handled in Kendall’s tau for meta-evaluating metrics has been a longstanding issue. We propose an update to the meta-evaluation methodology to handle ties. arxiv.org/pdf/2305.14324…

12.0K

Pinned

Dan Deutsch@_danieldeutsch · Dec 12

🚀 We have just released bfloat16 variants of all 3 MetricX-24 models, offering nearly identical performance to their float32 counterparts, but with a 50% smaller memory footprint. ✨ We hope this makes the XL and XXL models more accessible! 🔗 GitHub: github.com/google-researc…

JJurik Juraska@JurikJuraska · Dec 3

🌐 Meet MetricX-24, our SOTA machine translation evaluation metric and a successor to the successful MetricX-23. 🚀 Now open-source in PyTorch/Transformers! 🎉 Ready to take this top performer in the WMT24 Metrics Shared Task for a spin? 🔗 Code: github.com/google-researc…

331

Pinned

Dan Deutsch Retweeted

Jurik Juraska@JurikJuraska · Dec 3

2.0K

Dan Deutsch Retweeted

iseeaswell꩜bʂky@iseeaswell · Jun 17

Working on Low Resource Languages? Want to help with SMOL? join our new discord! discord.gg/YFTv7tkh

391

Dan Deutsch Retweeted

Markus Freitag@markuseful · Feb 19

Two new datasets from Google Translate targeting high and low resource languages! WMT24++: 46 new en->xx languages to WMT24, bringing the total to 55 SMOL: 6M tokens for 115 very low-resource languages WMT24++: huggingface.co/datasets/googl… SMOL: huggingface.co/datasets/googl…

15.0K

Dan Deutsch Retweeted

iseeaswell꩜bʂky@iseeaswell · Feb 19

😼SMOL DATA ALERT! 😼Anouncing SMOL, a professionally-translated dataset for 115 very low-resource languages! Paper: arxiv.org/pdf/2502.12301 Huggingface: huggingface.co/datasets/googl…

4.0K

Dan Deutsch Retweeted

Yusuf Kocyigit@mykocyigit · Feb 6

Thrilled to share our latest findings on data contamination, from my internship at @Google! We trained almost 90 Models on 1B and 8B scales with various contamination types using machine translation as our task and analyze the impact of contamination. arxiv.org/abs/2501.18771

11.0K

Dan Deutsch@_danieldeutsch · Nov 26

Super simple and effective way of significantly increasing the performance of your evaluation metric!

MMara Finkelstein@marafinkels · Nov 26

LLMs are typically evaluated w/ automatic metrics on standard test sets, but metrics + test sets are developed independently. This raises a crucial question: Can we design automatic metrics specifically to excel on the test sets we prioritize? Answer: Yes! arxiv.org/abs/2411.15387

877

Dan Deutsch@_danieldeutsch · Nov 19

The Google Translate Research Team is looking for interns this summer! Apply here if you will graduate from a PhD program in the 2025-2026 academic year, and send me an email to let me know that you applied google.com/about/careers/…

187

150

35.0K

Dan Deutsch@_danieldeutsch · Nov 12

New application link! google.com/about/careers/… I am at EMNLP/WMT this week. Please come find me if you want to learn more about this role!

DDan Deutsch@_danieldeutsch · Oct 18

Interested in doing research on Google Translate and Gemini? Good news! I’m hiring for full-time roles on the Google Translate Research Team! Apply here: google.com/about/careers/…

5.0K