Yusen Zhang (@YusenZhangNLP)

Pinned

Y

Yusen Zhang@YusenZhangNLP · Apr 30

🚀 How Far Are VLMs from Effective High-Resolution Image Understanding? 👉 We found: Still far. 🆕 Introducing HRScene Benchmark: 📸 25 Real-world Scenes + 🧪 2 Diagnostic NIAH Tests 🏙️ 8 Categories: Daily, Paper, Urban Planning, etc. 🖼️ Resolution: 1,024 × 1,024 ➡️ 35,503 ×…

YusenZhangNLP's tweet image. 🚀 How Far Are VLMs from Effective High-Resolution Image Understanding?
👉 We found: Still far.

🆕 Introducing HRScene Benchmark:
📸 25 Real-world Scenes + 🧪 2 Diagnostic NIAH Tests
🏙️ 8 Categories: Daily, Paper, Urban Planning, etc.
🖼️ Resolution: 1,024 × 1,024 ➡️ 35,503 ×…

1

5

14

4

4.0K

Y

Yusen Zhang@YusenZhangNLP · Jul 8

Our paper VisOnlyQA has been accepted to @COLM_conf #COLM2025! See you in Montreal🍁 We find that even recent Vision Language Models struggle with simple questions about geometric properties in images, such as "What is the degree of angle AOD?"🧐 arxiv.org/abs/2412.00947

RRyo Kamoi@RyoKamoi · Dec 4

📢 New preprint! Do LVLMs have strong visual perception capabilities? Not quite yet... We introduce VisOnlyQA, a new dataset for evaluating the visual perception of LVLMs, but existing LVLMs perform poorly on our dataset. [1/n] arxiv.org/abs/2412.00947 github.com/psunlpgroup/Vi…

2

9

56

11

15.0K

Y

Yusen Zhang@YusenZhangNLP · Jul 15

We updated our VisOnlyQA paper for #COLM2025! * LVLMs exhibit weak geometric perception even on geometric shapes with 2–3 lines 😭 * Gemini 2.5 Pro largely improves over prior models on charts and chemistry 😳 but still struggles with geometric shapes 😖 arxiv.org/abs/2412.00947

RRyo Kamoi@RyoKamoi · Jul 8

Our paper VisOnlyQA has been accepted to @COLM_conf #COLM2025! See you in Montreal🍁 We find that even recent Vision Language Models struggle with simple questions about geometric properties in images, such as "What is the degree of angle AOD?"🧐 arxiv.org/abs/2412.00947

0

2

11

0

3.0K

Y

Yusen Zhang@YusenZhangNLP · Jun 28

HRScene got accepted at #ICCV2025! HRScene is a novel unified benchmark for high-resolution image understanding with 25 scenes and 2 NIAH tests. Home page: yszh8.github.io/hrscene/ (Sorry, EvalAI for submission does not work currently...) My PhD research began with long text…

YYusen Zhang@YusenZhangNLP · Apr 30

🚀 How Far Are VLMs from Effective High-Resolution Image Understanding? 👉 We found: Still far. 🆕 Introducing HRScene Benchmark: 📸 25 Real-world Scenes + 🧪 2 Diagnostic NIAH Tests 🏙️ 8 Categories: Daily, Paper, Urban Planning, etc. 🖼️ Resolution: 1,024 × 1,024 ➡️ 35,503 ×…

0

3

6

0

617

Yusen Zhang Retweeted

G

GPT Maestro | LLMpedia Curator@GptMaestro · May 30

Vision Language Models display a peculiar blind spot: their ability to process image content declines in a U-shaped pattern based on Manhattan distance from corners, suggesting fundamental limitations in handling high-resolution layouts.

1

0

120

Yusen Zhang Retweeted

J

Jiaqi(Jack) Wang@jackqqwang · May 24

NeuroGen: We explored a training-free idea—using prompts to guide large models to generate neural net parameters for downstream tasks. It may ease hyperparam sensitivity & data dependence. Still early, feedback welcome! 📄 arxiv.org/pdf/2505.12470 #FoundationModels #LLM

0

1

2

1

219

Yusen Zhang Retweeted

R

Ryo Kamoi@RyoKamoi · May 23

📢 New paper! FoVer enhances PRMs for step-level verification of LLM reasoning w/o human annotation 🚀 We synthesize training data using formal verification tools and improve LLMs at step-level verification of LLM responses on MATH, AIME, MMLU, BBH, etc. arxiv.org/abs/2505.15960

4

25

127

70

31.0K

Yusen Zhang Retweeted

T

Tanishq Abraham is at ICML@iScienceLuvr · Apr 28

HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding? "we introduce HRScene, a novel unified benchmark for HRI understanding with rich scenes. HRScene incorporates 25 real-world datasets and 2 synthetic diagnostic datasets with resolutions ranging from…

3

11

35

18

4.0K

Y

Yusen Zhang@YusenZhangNLP · Apr 30

As Vision Language Models treat images as tokens, high-resolution images create long sequences, similar to long-context challenges in LLMs. In this new paper, we release a benchmark to test VLMs capabilities to understand high-resolution images, up to hundreds of millions of…

YYusen Zhang@YusenZhangNLP · Apr 30

🚀 How Far Are VLMs from Effective High-Resolution Image Understanding? 👉 We found: Still far. 🆕 Introducing HRScene Benchmark: 📸 25 Real-world Scenes + 🧪 2 Diagnostic NIAH Tests 🏙️ 8 Categories: Daily, Paper, Urban Planning, etc. 🖼️ Resolution: 1,024 × 1,024 ➡️ 35,503 ×…

0

5

17

2

1.0K

Y

Yusen Zhang@YusenZhangNLP · Apr 12

Ever wondered how much you can trust a benchmark? We did too - so we built SMART to make them smarter! I will be presenting our work at NAACL along with my coauthors @megnung @DavidPantoja__ See you all in New Mexico!

VVipul Gupta@vipul_1011 · Oct 30

🚨 New paper alert 🚨 Ever struggled with quick saturation or unreliability in benchmark datasets? Introducing SMART Filtering to select high-quality, reducing dataset size by 48% on avg (up to 68% for ARC!) and improving correlation with scores from ChatBot Arena! 📈✨ (1/N)

0

7

19

1

1.0K

Y

Yusen Zhang@YusenZhangNLP · May 1

Want to learn about fairness in summarization? @HaoyuanLi9 will present our work on fairness in multidocument summarization at #NAACL25 in a couple of hours. This was in collaboration with @YusenZhangNLP and @ruizhang_nlp

HHaoyuan Li@HaoyuanLi9 · Apr 29

Happy to announce that our paper receives SAC award for summarization @naaclmeeting . I will present the paper on Session H: Oral/Poster 5, Thursday May 1, 14:00-15:30.

0

2

5

0

550

Y

Yusen Zhang@YusenZhangNLP · Apr 30

This work is led by my first PhD student Yusen @YusenZhangNLP, who is graduating soon and actively seeking a postdoc position in academia. It has been an absolute pleasure to work with him. Please consider hiring him!

YYusen Zhang@YusenZhangNLP · Apr 30

🚀 How Far Are VLMs from Effective High-Resolution Image Understanding? 👉 We found: Still far. 🆕 Introducing HRScene Benchmark: 📸 25 Real-world Scenes + 🧪 2 Diagnostic NIAH Tests 🏙️ 8 Categories: Daily, Paper, Urban Planning, etc. 🖼️ Resolution: 1,024 × 1,024 ➡️ 35,503 ×…

0

5

11

0

1.0K

Y

Yusen Zhang@YusenZhangNLP · Apr 30

🎉Our paper on fairness of multidoc summarization has received an SAC award at NAACL 2025! 🥳 We appreciate the recognition from senior area chairs. @HaoyuanLi9 and @YusenZhangNLP will present our work: Posters (Exhibit Hall), Session H: Oral/Poster 5, Thursday May 1,…

NNAACL HLT 2025@naaclmeeting · Apr 25

🟢 Announcing the #NAACL2025 Award Winners! The Best Paper and Best Theme Paper winners will present at our closing session 2025.naacl.org/blog/best-pape…

0

5

20

0

2.0K

Y

Yusen Zhang@YusenZhangNLP · Apr 30

I will be at NAACL this week. Welcome to discuss with me if you have any thoughts on this project and all the other research topics!

YYusen Zhang@YusenZhangNLP · Apr 30

🚀 How Far Are VLMs from Effective High-Resolution Image Understanding? 👉 We found: Still far. 🆕 Introducing HRScene Benchmark: 📸 25 Real-world Scenes + 🧪 2 Diagnostic NIAH Tests 🏙️ 8 Categories: Daily, Paper, Urban Planning, etc. 🖼️ Resolution: 1,024 × 1,024 ➡️ 35,503 ×…

0

3

0

209