WikiResearch
@WikiResearch
Research news on @Wikipedia @Wikidata @Wikimedia, hot off the press. Volunteer-curated by @tilmanbayer & @mad_astronaut. Read our newsletter for full coverage
In the new edition of our monthly newsletter: * How readers use Wikipedia health content * Scholars are generally happy with how their papers are cited on Wikipedia * Several other papers about references on Wikipedia and more: meta.wikimedia.org/wiki/Research:…

3⃣ Entity Insertion in Multilingual Linked Corpora: The Case of Wikipedia Fri 1 Aug, 15:00 - 16:30 CET Room: 2.31 (WikiNLP Wrkshp) A new NLP task to organize textual information in knowledge networks, a massively multilingual benchmark dataset and model for 105 languages
"Factual Inconsistencies in Multilingual Wikipedia Tables" arxiv.org/html/2507.1840… "while English provides the most comprehensive coverage in terms of volume, German Wikipedia faces significant data quality challenges despite having substantial content"

"The afterlife of a ghost-written paper: How corporate authorship shaped two decades of glyphosate safety discourse", including on English Wikipedia doi.org/10.1016/j.envs… (editors held Monsanto's authorship didn't render it an inadmissible source as long as it wasn't retracted)


Wikidata for Botanists: Connecting People, Plants, and Data 🧑🔬Now in @botanyone 🌿 botany.fyi/7dtcvb?utm_cam… Full paper: doi.org/10.1093/aob/mc…
Millions of GeARs: Extending GraphRAG to Millions of Documents Huawei presents a method to scale GraphRAG to millions of documents by adapting GeAR with online alignment between passages and Wikidata triples, avoiding costly LLM-based triple extraction 📝arxiv.org/abs/2507.17399
"The negotiation of pronominal address on talk pages of the German, French, and Italian Wikipedia" ids-pub.bsz-bw.de/frontdoor/deli… German and Italian "wikiquette" stipulates the informal "du"/"tu" among editors (instead of "Sie"/"Lei"). French Wikipedia lacks consensus on "vous" vs. "tu"
"WETBench: A Benchmark for Detecting Task-Specific Machine-Generated Text on Wikipedia" arxiv.org/html/2507.0337… "[AI] detectors from diverse families [e.g. Binoculars, FastDetectGPT] underperform on our data"
![WikiResearch's tweet image. "WETBench: A Benchmark for Detecting Task-Specific Machine-Generated Text on Wikipedia" arxiv.org/html/2507.0337…
"[AI] detectors from diverse families [e.g. Binoculars, FastDetectGPT] underperform on our data"](https://pbs.twimg.com/media/Gv8z0KWW8AABbfN.png)
"Uniting and reigniting critical Wikimedia research" doi.org/10.1177/205395… Commentary detailing ten proposed action points from "A Manifesto for Wikimedia Research: Critically Studying Media as Infrastructure" (developed in context of last year's "Wikihistories" conference)

"Decoding revision mechanisms in Wikipedia" by examining 537 articles from WikiProject Climate change on English Wikipedia doi.org/10.1177/146144… E.g. "only 0.6% of edits were related to climate denial in the annotated samples of edit summaries"

ICWSM 2025 Test of Time honorable mention: "It's a Man's Wikipedia? Assessing Gender Inequality in an Online Encyclopedia" (2015) doi.org/10.1609/icwsm.… by @clauwa, @dgarcia_eu, Mohsen Jadidi, and @mstrohm (Review in our newsletter at the time: meta.wikimedia.org/wiki/Research:… )
Test of time honorable mention and award at #ICWSM 2025 🏆
ICWSM 2025 best paper honorable mention: "Protection from Evil and Good: The Differential Effects of Page Protection on Wikipedia Article Quality" (Thorsten Ruprechter, @manoelribeiro,@cervisiarius,Denis Helic) doi.org/10.1609/icwsm.… Earlier presentation: mediawiki.org/wiki/Wikimedia…
Best paper award and honorable mentions 🏆 Congrats to all authors! #icwsm
Explore Wikipedia through a data map. Pages are grouped by semantic similarity, for topic clusters. Hover to see details, zoom to explore fine-grained topics, click to go to a page. Search page names to find interesting starting points for exploration.🧵 lmcinnes.github.io/datamapplot_ex…
"The Wikipedia Editions of Low German and Other European Minority Languages" researchgate.net/profile/Carlos… ("Bots are editing more than 50% of Wikipedia articles in the Occitan, Low German, Piedmontese, Sardinian, and Kashubian Wikipedia editions.")

"ChatGPT shows a clear preference for Wikipedia" when citing sources (47.9% of citations within the top 10 most-cited sources, vs. 5.7% in Google AI Overviews and not reaching the top 10 in Perplexity) tryprofound.com/blog/ai-platfo…

Video recordings from last month's @wikiworkshop youtube.com/playlist?list=…
"Fake news, an internet troll, and a conspiracy theory about ‘Wikipedia's Intentional Distortion of the History of the Holocaust’" drive.google.com/file/d/1nCcu0C… Response by one of the Wikipedians criticized in a 2023 paper with that title. Cf. its review in our newsletter at the time:
In the new issue of our monthly newsletter: • English Wikipedia's "Intentional Distortion of the Holocaust" in Poland • News event coverage on four language Wikipedias: Some "self-focus bias" but also "strong evidence of a global representation" meta.wikimedia.org/wiki/Research:…