Yinhong Liu

@YinhongLiu2

PhD student @CambridgeLTL @Cambridge_Uni. Previous research intern at Siri/AIML @Apple and @MSFTResearch. Interested in #ML, #NLProc and #LLM.

Cambridge, UK

Joined October 2021

183Following

245Followers

Pinned

Yinhong Liu@YinhongLiu2 · Jan 5

🚨 New Paper Alert! 🚨 When using LLMs for judgements, ever wondered about the consistency of those judgments? 🤔 Check out our latest work, where we quantify, evaluate, and enhance the logical/preference consistency of LLMs. 📚 🔗 Read more: arxiv.org/abs/2410.02205

YinhongLiu2's tweet image. 🚨 New Paper Alert! 🚨
When using LLMs for judgements, ever wondered about the consistency of those judgments? 🤔
Check out our latest work, where we quantify, evaluate, and enhance the logical/preference consistency of LLMs. 📚

🔗 Read more: arxiv.org/abs/2410.02205

250

160

23.0K

Yinhong Liu Retweeted

Yi Xu@_yixu · May 19

🚀Let’s Think Only with Images. No language and No verbal thought.🤔 Let’s think through a sequence of images💭, like how humans picture steps in their minds🎨. We propose Visual Planning, a novel reasoning paradigm that enables models to reason purely through images.

213

1.0K

201.0K

Yinhong Liu Retweeted

Yi Xu@_yixu · Mar 18

🔥Are we ranking LLMs correctly?🔥 Large Language Models (LLMs) are widely used as automatic judges, but what if their rankings are unstable?😯Our latest study finds non-transitivity in LLM-as-a-judge evaluations—where A > B, B > C, but… C > A?! 🔄

132

22.0K

Yinhong Liu Retweeted

Sicong@Leon_L_S_C · Mar 11

🌟 MMR1 Multimodal Reasoning Project Now Open-Source! We’re thrilled to announce the release of MMR1, an open-source project dedicated to advancing multimodal reasoning research. The first milestone is MMR1-Math, a specialized multimodal model for mathematical tasks, achieving…

136

13.0K

Yinhong Liu Retweeted

River Yijiang Dong@river_dong121 · Mar 4

🚨New Paper Alert🚨 Many personalization methods optimize performance but ignore real-world impact. We examine its effects on: ✅ Performance ⚖️ Fairness: Can it represent minorities fairly? ⚠️ Unintended Effects: Does it harm safety? 🔄 Adaptability: Quickly adapt to new users?

3.0K

Yinhong Liu@YinhongLiu2 · Feb 27

Long-text factuality is a challenging topic and here’s our cheap & effective approach! 🚀🚀🚀

HHaoran Liu@Haoran89332647 · Feb 27

‼️New Paper Alert‼️ ⁉️ How to perform fine-grained fact checks on long text efficiently❓ GraphCheck: Breaking Long-Term Text Barriers with Extracted Knowledge Graph-Powered Fact-Checking lnkd.in/gy3YXkG3 (1/3)

247

Yinhong Liu Retweeted

Ahmad Beirami@abeirami · Feb 3

𝐛𝐞𝐬𝐭-𝐨𝐟-𝐧 is a strong baseline for - improving agents - scaling inference-time compute - preference alignment - jailbreaking models How does 𝐁𝐨𝐧 work? and why is it so strong? Find some answers in the paper we wrote over two Christmas breaks!🧵

354

290

46.0K

Yinhong Liu Retweeted

Chengzu Li@li_chengzu · Jan 14

Forget just thinking in words. 🚀 New Era of Multimodal Reasoning🚨 🔍 Imagine While Reasoning in Space with MVoT Multimodal Visualization-of-Thought (MVoT) revolutionizes reasoning by generating visual "thoughts" that transform how AI thinks, reasons, and explains itself.

168

750

708

78.0K

Yinhong Liu@YinhongLiu2 · Jan 8

🚀 Interested in building a reliable PRM? Check out our new paper on PRMBENCH – the first process-level reward benchmark! To facilitate the research, we’ve also released a "PRM-Eval Toolkit" to evaluate various PRMs & tasks! 🤗 #AI #Benchmark #PRM

MMingyang Song@ssmisya1 · Jan 7

Is your Process-Level Reward Model really good? 🤔 We're thrilled to release PRMBENCH: A Fine-grained and Challenging Benchmark for Process-Level Reward Models! This new resource offers a deeper dive into PRM evaluation. Explore the paper & project page here 👇 📄[Paper Link]…

573

Yinhong Liu@YinhongLiu2 · Dec 10

I'll be presenting CLUES🔍 at #NeurIPS2024 in person! Catch us at the poster session on: ⏰ Wed, Dec 11, 4:30–7:30 PM PST 📍 East Exhibit Hall A-C #1902 (Add it to your calendar: tinyurl.com/neurips-clues😊)

WWanru Zhao@Renee42581826 · Dec 10

Excited to share our work "CLUES🔍: Collaborative Private-domain High-quality Data Selection for LLMs via Training Dynamics" We propose an automated high-quality data selection method for LLMs in collaborative settings (e.g., federated learning, model merging, multi-agent…

11.0K

Yinhong Liu Retweeted

Zhijiang Guo@ZhijiangG · Dec 2

Life update: 🎉 I'm excited to share that I will be joining @HKUSTGuangzhou as an Assistant Professor in Spring 2025! I'm looking for multiple PhDs and interns who are passionate about exploring research questions related to knowledge and reasoning in the context of LLMs. 🤖

186

19.0K

Yinhong Liu Retweeted

Caiqi Zhang@caiqizh · Nov 11

🔥Check our EMNLP paper with @vlachos_nlp and @ZhijiangG 🤔Do We Need Language-Specific Fact-Checking Models? The Case of Chinese arxiv.org/abs/2401.15498 ‼️ We find the domain and cultural biases in the Chinese fact-checking area that necessitate language-specific tools!

2.0K

Yinhong Liu@YinhongLiu2 · Nov 11

Attending #EMNLP2024 Virtually📺! If you've ever wondered how to PROMPT your LLM-as-a-Judge⚖️, stay tuned! We will present ZEPO in the Gather Room 147 on Tue. 12, 17:45: 1. Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments. See you online🚀

HHan Zhou@hanzhou032 · Jun 18, 2024

Which output is better? [A] or [B]? LLM🤖: B❌ [B] or [A]? LLM🤖: A✅ Thrilled to share our preprint in addressing preference biases in LLM judgments!🧑‍⚖️We introduce ZEPO, a 0-shot prompt optimizer that enhances your LLM evaluators via fairness⚖️ 📰Paper: arxiv.org/abs/2406.11370

795

Yinhong Liu Retweeted

Yingjia Alisa Wan@Yingjia_Wan · Oct 28

💥 Introducing "AutoPSV: Automated Process Supervised Verifier" - accepted at #NeurIPS2024! AutoPSV automatically annotates reasoning steps via confidence tracking, making it efficient and effective even without ground-truth answers. 🔗 arxiv.org/abs/2405.16802 🧵1/5

110

15.0K