Chenghao Xiao

@gowitheflow98

NLP PhD@Durham University

Joined October 2023

28Following

25Followers

Pinned

Chenghao Xiao Retweeted

DailyPapers@HuggingPapers · May 5

New method for robust and efficient LLM-based NLG evaluation An inversion learning method that learns effective reverse mappings from model outputs back to their input instructions, enabling the automatic generation of highly effective, model-specific evaluation prompts.

14.0K

Chenghao Xiao Retweeted

Christopher@communicating · Jun 13

I think this is not only a super dataset but it appears the recipe used to generated it may generalize well & so be useful beyond this task! I need to investigate to be sure. ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning huggingface.co/papers/2506.09…

83.0K

Chenghao Xiao@gowitheflow98 · Jun 13

Paper of the day on HF!

CChristopher@communicating · Jun 13

329

211

69.0K

Chenghao Xiao Retweeted

Hou Pong (Ken) Chan@kenchanhp · Jun 10

🚀We are thrilled to launch 'Lingshu' – A Generalist Medical Multi-modal Foundation Model! 🩻 🌟 Highlights of Lingshu: ⚕️ Unified knowledge across 12+ imaging modalities (X-Ray, CT, MRI & more!). 🧠 Enhanced reasoning & reduced hallucinations via novel data curation and…

2.0K

Chenghao Xiao Retweeted

Hou Pong (Ken) Chan@kenchanhp · Jun 6

🚀 Discover how LLMs perceive their knowledge boundaries across languages in our #ACL2025 main paper! 🌍 By probing LLMs’ internal representations, we reveal key insights on where knowledge boundaries are encoded & propose a training-free method to combat cross-lingual…

2.0K

Chenghao Xiao Retweeted

Hanhua Hong@hanhua_hong · May 5

[1/n] Can we generate highly effective, model-specific prompts via inversion learning? Delighted to introduce our new paper: Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts. Paper: arxiv.org/abs/2504.21117 HF Daily: huggingface.co/papers/2504.21…

743

Chenghao Xiao Retweeted

tomaarsen@tomaarsen · Apr 24

You might've heard of MTEB: Massive Text Embedding Benchmark. Until now, it was famously primarily for text, but now it has been extended to include MIEB: Massive Image Embedding Benchmark! Details in 🧵

3.0K

Chenghao Xiao Retweeted

�

𝚐𝔪𝟾𝚡𝚡𝟾@gm8xx8 · Apr 21

Analyzing LLMs Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations Examines how LLMs internally distinguish known from unknown questions across languages. - signals localize to mid–upper layers - cross-lingual structure is linear—enables…

3.0K

Chenghao Xiao@gowitheflow98 · Feb 21

Introducing the 3rd edition of BioLaySumm shared task hosted at the BioNLP Workshop @ #ACL2025NLP ! biolaysumm.org BioLaySumm is a shared task that focuses on generating easy-to-understand lay summaries for complex biomedical texts. Building on the success of the first…

gowitheflow98's tweet image. Introducing the 3rd edition of BioLaySumm shared task hosted at the BioNLP Workshop @ #ACL2025NLP ! biolaysumm.org

BioLaySumm is a shared task that focuses on generating easy-to-understand lay summaries for complex biomedical texts. Building on the success of the first…

2.0K

Chenghao Xiao Retweeted

Niklas Muennighoff@Muennighoff · Feb 20

LB: huggingface.co/spaces/mteb/le… Great work by @KCEnevoldsen @isaacchung1217 Imene Kerboua Márton Kardos @risolomatin @tomaarsen @gowitheflow98 @vaibhav_adlakha @orionweller @sivareddyg & many others ❤️

7.0K

Chenghao Xiao@gowitheflow98 · Aug 6

Check out Yiqi's work -- the first systematic research of the "self-preference" bias of LLMs as evaluators

YYiqi Liu@yiqi_617 · Aug 5

[1/n] Do language model-driven evaluation metrics inherently favour texts generated by the same underlying model? Our #ACL2024 Findings paper (arxiv.org/abs/2311.09766) investigates this bias, focusing on metrics such as BARTScore, T5Score, and GPTScore in summarisation tasks. To…

2.0K

Chenghao Xiao Retweeted

Siwei Wu（吴思为）@siweiwu7 · Jan 25, 2024

1/ Excited to announce the release of our new paper "SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval". This benchmark comprises 530K meticulously curated image-text pairs extracted from scientific documents(arXiv Paper). arxiv.org/abs/2401.13478

9.0K