Ehud Reiter
@EhudReiter
I am a computer scientist who works on natural language generation and evaluation, often in healthcare contexts. I teach at Aberdeen University.
Really happy that this survey of NLP in cancer care, from my student @MengxuanS , has finally appeared (its been a saga). One key but depressing finding is that evaluation quality is uniformly dreadful by medical standards; NLP researchers just dont seem to care...
📢New publication in Artificial Intelligence in Medicine! Our PhD student @MengxuanS alongside @rosadamaberdeen @EhudReiter & @LF_Duncan have published a #scopingreview on the role of natural language processing in #cancer care. sciencedirect.com/science/articl…
I've been yapping for months about bad evaluation setups and how results/AI behaviors are reported, and this new @AISecurityInst paper does so much more clearly. In short: There's a massive difference between showing a model can do something sketchy versus showing it tends to…
😈 Here's why you should not worry that models will start blackmailing you out of nowhere: 1. At their heart, LLMs are pattern-matching and prediction engines. Given an input, they predict the most statistically likely continuation based on the vast dataset they were trained on.…
My daughter Naomi is visiting us in Aberdeen. We dont have a mikvah, so she is using the river to toivel her kitchen equipment

While we’re building amazing new human-AI systems, how do we actually know if they work well for people? In our #ACL2025 Findings Paper, we introduce SPHERE, a framework for making evaluations of human-AI systems more transparent and replicable. ✨aclanthology.org/2025.findings-…
Nice to see our research project in the news!
📢 Research in the news! The ASICA project, led by Prof Peter Murchie and funded by @CR_UK has been featured in the @Telegraph. ASICA is a smartphone app 📱 designed to make skin checks after #melanoma easier and more efficient. Read the article here: telegraph.co.uk/science/cancer…
I'll be at ACL next week (Tue-Thur, not Sun/Mon). Look forward to meeting old friends and new people who want to connect! Ill also be giving an invited talk on impact evaluation at the GEM workshop on Thur 31 July
I was in international math olympiad in 1978 (got silver, not gold). Strange to see LLM world talk about contest for teenagers who love math, but I guess good publicity! Although if 2025 IMO is like 1978, there is a high risk of data contamination; its contest for teens, not LLM
Chat with local clinician about health apps used by patients to self-diagnose problems. This caused flood of "worried well" asking for help (perhaps because apps super-cautious about false neg); bad for patients (high anxiety), bad for (overloaded) UK health system
Motivated by recent discussion with my group: Ignore subjective statements such as "I find LLMs to be incredibly useful for XX", especially when made by people (such as AI companies or gurus) who have strong biases/incentives/COI .
Talk on deep-fake detection (vision) ME: How evaluate detector? SPEAK: Perf on standard test sets ME: Does this predict real-world useful? What if creators change fakes? S: Test set perf may not mean much, but its what academic venues care about Weak eval isnt just in NLP...
Personal blog about my holidays, nothing to do with NLG or AI: Cycling in Netherlands ehudreiter.com/2025/07/13/cyc…
Nice example of using an RCT to measure real-world impact of LLM tech
We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
Ever experienced cancer? We are doing a research study about using Artificial Intelligence to improve quality of life questionnaires. We would like to invite you to complete a short questionnaire. forms.office.com/e/rKxHjS3NW4 Find more information: cancerresearchuk.org/about-cancer/f…
Looked at Google Scholar, nice to see that my h-index has reached 60
If a woman is considering IVF and uses an AI model to predict liklihood of success (having a baby), what questions and explanations would she ask for? Below shows that questions about what information model considers are more likely than questions about how model works
New blog: Patients want to know what information an AI model considers Adarsa Sivaprasad is exploring what questions users of AI health prediction model actually have. Many questions about what information a model considers, fewer about how model works ehudreiter.com/2025/06/25/pat…
New blog: Patients want to know what information an AI model considers Adarsa Sivaprasad is exploring what questions users of AI health prediction model actually have. Many questions about what information a model considers, fewer about how model works ehudreiter.com/2025/06/25/pat…
Fascinating discussion with clinician about LLM texts we want to show to patients. Pointed out that some of the material in LLM text came from websites of private clinics which are trying to encourage patients to sign up for expensive and profitable interventions.
Congratulations to my student Adarsa Sivaprasad for winning a best PhD student poster award at @UK_healtex , for her work on "A conversational agent to address patient needs for out-of-distribution explanations"!
