Haoyi Qiu

@HaoyiQiu

Research intern @SFResearch ☁️ PhD student @UCLANLP 🧸 BS in CS&Math @UMich 〽️ #NLP #Multimodal #Safety 🌷

Los Angeles, CA

Joined October 2018

815Following

970Followers

Pinned

Haoyi Qiu@HaoyiQiu · May 22

🌏How culturally safe are large vision-language models? 👉LVLMs often miss the mark. We introduce CROSS, a benchmark of 1,284 image-query pairs across 16 countries & 14 languages, revealing how LVLMs violate cultural norms in context. ⚖️ Evaluation via CROSS-EVAL 🧨 Safety…

HaoyiQiu's tweet image. 🌏How culturally safe are large vision-language models? 👉LVLMs often miss the mark.

We introduce CROSS, a benchmark of 1,284 image-query pairs across 16 countries &amp; 14 languages, revealing how LVLMs violate cultural norms in context.

⚖️ Evaluation via CROSS-EVAL
🧨 Safety…

8.0K

Haoyi Qiu Retweeted

Manling Li@ManlingLi_ · Jun 30

Can VLMs build Spatial Mental Models like humans? Reasoning from limited views? Reasoning from partial observations? Reasoning about unseen objects behind furniture / beyond current view? Check out MindCube! 🌐mll-lab-nu.github.io/mind-cube/ 📰arxiv.org/pdf/2506.21458…

281

233

37.0K

Haoyi Qiu Retweeted

Qiyue Gao@QiyueGao123 · Jul 1

🤔 Have @OpenAI o3, Gemini 2.5, Claude 3.7 formed an internal world model to understand the physical world, or just align pixels with words? We introduce WM-ABench, the first systematic evaluation of VLMs as world models. Using a cognitively-inspired framework, we test 15 SOTA…

207

138

30.0K

Haoyi Qiu Retweeted

Martin Ziqiao Ma@ziqiao_ma · Jun 24

Can we scale 4D pretraining to learn general space-time representations that reconstruct an object from a few views at any time to any view at any other time? Introducing 4D-LRM: a Large Space-Time Reconstruction Model that ... 🔹 Predicts 4D Gaussian primitives directly from…

100

16.0K

Haoyi Qiu@HaoyiQiu · Jun 24

Glad to be part of the team! It's been a great pleasure working with so many talented people at Tesla (both in and out of this photo), and under the guidance of great leaders @elonmusk, @ThomasAlxDmy, @philduan, @aelluswamy and many more.

PPhil Duan@philduan · Jun 22

Hi there!

627

Haoyi Qiu Retweeted

elvis@omarsar0 · Jun 16

Great share as usual! Just read this related piece where a study showed issues with LLM-based agents not recognizing sensitive information and not adhering to appropriate data handling protocols: theregister.com/2025/06/16/sal… paper: arxiv.org/abs/2505.18878

5.0K

Haoyi Qiu Retweeted

Tanmay Parekh@tparekh97 · Jun 11

🚨 New work: LLMs still struggle at Event Detection due to poor long-context reasoning and inability to follow task constraints, causing precision and recall errors. We introduce DiCoRe — a lightweight 3-stage Divergent-Convergent reasoning framework to fix this.🧵📷 (1/N)

4.0K

Haoyi Qiu Retweeted

Yike Wang@yikewang_ · Jun 9

LLMs are helpful for scientific research — but will they continuously be helpful? Introducing 🔍ScienceMeter: current knowledge update methods enable 86% preservation of prior scientific knowledge, 72% acquisition of new, and 38%+ projection of future (arxiv.org/abs/2505.24302).

240

127

23.0K

Haoyi Qiu Retweeted

Kung-Hsiang Steeve Huang@steeve__huang · May 30

🚨 The Business AI Plot Thickens 🚨 CRMArena set the stage for business AI evaluation in realistic environments. Now we're back with CRMArena-Pro - a major expansion that extends to 19 work tasks across diverse business applications (sales, service, and CPQ processes). It covers…

4.0K

Haoyi Qiu Retweeted

Stella Li@StellaLisy · May 27

🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…

343

2.0K

1.0K

680.0K

Haoyi Qiu Retweeted

Yung-Sung Chuang@YungSungChuang · May 27

🚨Do passage rerankers really need explicit reasoning?🤔—Maybe Not! Our findings: ⚖️Standard rerankers outperform those w/ step-by-step reasoning! 🚫Disable reasoning from reasoning reranker actually improves reranking accuracy!🤯 👇But, why? 📰arxiv.org/abs/2505.16886 (1/6)

8.0K

Haoyi Qiu@HaoyiQiu · May 22

Cultural safety in AI isn't just nice-to-have, it's essential ✅ Our new paper reveals that leading VLMs struggle with cultural appropriateness across different contexts. We developed CROSS, a multimodal cultural safety benchmark spanning 16 countries and 14 languages, to…

HHaoyi Qiu@HaoyiQiu · May 22

140

Haoyi Qiu@HaoyiQiu · May 17

Top 2 takeaways from our work: 1. VLM visual features do contain info for visual arithmetic—but without fine-tuning a strong decoder, it remains locked. 2. Training VLMs on just 8 invariant properties can enhance chart and visual math tasks, matching SFT with 60% less data.

KKung-Hsiang Steeve Huang@steeve__huang · May 17

Excited to share that CogAlign is accepted at #ACL2025 Findings! We investigated the "Jagged Intelligence" of VLMs – their surprising difficulty with basic visual arithmetics (e.g., counting objects, measuring angles) compared to their strong performance on harder visual tasks.…

985

Haoyi Qiu@HaoyiQiu · May 17

KKung-Hsiang Steeve Huang@steeve__huang · Feb 18

Vision Language Models (VLMs) are great at many things, but they often fumble when it comes to simple visual arithmetics like counting or comparing lengths, hindering their understanding of charts 📈 and geometry 📐. Our new paper explores why this happens 🧐 and discover the…

5.0K

Haoyi Qiu Retweeted

Yunzhi Yao@yyzTodd · May 14

🚨 New Blog Drop! 🚀 "Reflection on Knowledge Editing: Charting the Next Steps" is live! 💡 Ever wondered why knowledge editing in LLMs still feels more like a lab experiment than a real-world solution? In this post, we dive deep into where the research is thriving — and where…

5.0K