Shiyu Ni@ACL 2025
@Shictyu
✈️ Vienna Ph.D candidate at the Institute of Computing Technology, Chinese Academy of Sciences | NLP; IR
😎Our paper “Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective” is accepted to #acl2025 w/@bikeping etc. We propose a psychometric-inspired framework to induce and evaluate implicit bias in LLMs. Project webpage:yuchenwen1.github.io/ImplicitBiasEv…
🚨 New release: MegaScience The largest & highest-quality post-training dataset for scientific reasoning is now open-sourced (1.25M QA pairs)! 📈 Trained models outperform official Instruct baselines 🔬 Covers 7+ disciplines with university-level textbook-grade QA 📄 Paper:…
The official website for SIGIR-AP 2025 is now live! Please visit: sigir-ap.org/sigir-ap-2025. This year, we are also inviting industry papers. We also encourage authors of unsuccessful SIGIR submissions to consider submitting to SIGIR-AP. We look forward to seeing you in Xi'an!
🚀Exciting to see how recent advancements like OpenAI’s O1/O3 & DeepSeek’s R1 are pushing the boundaries! Check out our latest survey on Complex Reasoning with LLMs. Analyzed over 300 papers to explore the progress. Paper: arxiv.org/pdf/2502.17419 Github: github.com/zzli2022/Aweso…
💡Check MAIR at #EMNLP2024 A large-scale IR benchmark! Highlights: - Task Diversity: 126 realistic tasks, 8x than BEIR 📈 - Domain Coverage: 6 domains and heterogeneous sources 📚 - Instruction Following: 805 relevance criterions - Lightweight & Fast: optimized data sampling ⚡️
Why does GPT-4o Mini Outperform Claude 3.5 Sonnet on LMSys? Formatting is important. Half a year ago, we started studying how response formats impact AI. We found that just changing the format boosts both readability and mathematical reasoning.
A Systematic Survey of Prompting Techniques
The Prompt Report A Systematic Survey of Prompting Techniques Generative Artificial Intelligence (GenAI) systems are being increasingly deployed across all parts of industry and research settings. Developers and end users interact with these systems through the use of
Teach LLMs to express truthful confidence levels in “long-form generations”!
When LLMs are unsure, they either hallucinate or abstain. Ideally, they should clearly express truthful confidence levels. Our #ICML2024 work designs an alignment objective to achieve this notion of linguistic calibration in *long-form generations*. arxiv.org/abs/2404.00474 🧵