Chenghao Xiao
@gowitheflow98
NLP PhD@Durham University
New method for robust and efficient LLM-based NLG evaluation An inversion learning method that learns effective reverse mappings from model outputs back to their input instructions, enabling the automatic generation of highly effective, model-specific evaluation prompts.
I think this is not only a super dataset but it appears the recipe used to generated it may generalize well & so be useful beyond this task! I need to investigate to be sure. ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning huggingface.co/papers/2506.09…
Paper of the day on HF!
I think this is not only a super dataset but it appears the recipe used to generated it may generalize well & so be useful beyond this task! I need to investigate to be sure. ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning huggingface.co/papers/2506.09…
🚀We are thrilled to launch 'Lingshu' – A Generalist Medical Multi-modal Foundation Model! 🩻 🌟 Highlights of Lingshu: ⚕️ Unified knowledge across 12+ imaging modalities (X-Ray, CT, MRI & more!). 🧠 Enhanced reasoning & reduced hallucinations via novel data curation and…
🚀 Discover how LLMs perceive their knowledge boundaries across languages in our #ACL2025 main paper! 🌍 By probing LLMs’ internal representations, we reveal key insights on where knowledge boundaries are encoded & propose a training-free method to combat cross-lingual…
[1/n] Can we generate highly effective, model-specific prompts via inversion learning? Delighted to introduce our new paper: Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts. Paper: arxiv.org/abs/2504.21117 HF Daily: huggingface.co/papers/2504.21…
You might've heard of MTEB: Massive Text Embedding Benchmark. Until now, it was famously primarily for text, but now it has been extended to include MIEB: Massive Image Embedding Benchmark! Details in 🧵
Analyzing LLMs Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations Examines how LLMs internally distinguish known from unknown questions across languages. - signals localize to mid–upper layers - cross-lingual structure is linear—enables…
Introducing the 3rd edition of BioLaySumm shared task hosted at the BioNLP Workshop @ #ACL2025NLP ! biolaysumm.org BioLaySumm is a shared task that focuses on generating easy-to-understand lay summaries for complex biomedical texts. Building on the success of the first…

LB: huggingface.co/spaces/mteb/le… Great work by @KCEnevoldsen @isaacchung1217 Imene Kerboua Márton Kardos @risolomatin @tomaarsen @gowitheflow98 @vaibhav_adlakha @orionweller @sivareddyg & many others ❤️
Check out Yiqi's work -- the first systematic research of the "self-preference" bias of LLMs as evaluators
[1/n] Do language model-driven evaluation metrics inherently favour texts generated by the same underlying model? Our #ACL2024 Findings paper (arxiv.org/abs/2311.09766) investigates this bias, focusing on metrics such as BARTScore, T5Score, and GPTScore in summarisation tasks. To…
1/ Excited to announce the release of our new paper "SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval". This benchmark comprises 530K meticulously curated image-text pairs extracted from scientific documents(arXiv Paper). arxiv.org/abs/2401.13478