Haodong Duan

@KennyUTC

Researcher @ Shanghai AI Lab, working on multi-modal learning B.S. @PKU1898 / Ph.D. @CUHKofficial Built #VLMEvalKit for MLLM evaluation

Shanghai

Joined August 2017

243Following

209Followers

Pinned

Haodong Duan@KennyUTC · Apr 4

OpenCompass just released RISEBench, the first benchmark on Reasoning-Informed Visual Editing (RISE). GPT-4o Image Generation only scores 36% on this challenging task! Technical Report: huggingface.co/papers/2504.02… #GPT4o

KennyUTC's tweet image. OpenCompass just released RISEBench, the first benchmark on Reasoning-Informed Visual Editing (RISE).

GPT-4o Image Generation only scores 36% on this challenging task!

Technical Report: huggingface.co/papers/2504.02…

#GPT4o

1.0K

Haodong Duan@KennyUTC · Jul 25

wow, so now the speech is good in CA universities?

T@ ·

334

Haodong Duan Retweeted

FearBuck@FearedBuck · Jul 20

families may have been divided but the world is united

2.0K

50.0K

571.0K

41.0K

49.7M

Haodong Duan@KennyUTC · Jul 7

Just created a Gallery to display all generation results on RISEBench (by powerful models including GPT-4o Image, Gemini-2.0, Bagel, etc.). Please contact me if you want the results of your new model to be included! Tech Report: arxiv.org/abs/2504.02826

HHaodong Duan@KennyUTC · Apr 4

984

Haodong Duan@KennyUTC · Mar 15

- VisualPRM for Test Time Scaling of Visual Reasoning Problems: arxiv.org/abs/2503.10291 - 5%~10% Avg. Accuracy Improvement over 7 mainstream benchmarks. - This work is released with 400K Tuning Data & 3K Benchmark Problems

KennyUTC's tweet image. - VisualPRM for Test Time Scaling of Visual Reasoning Problems: arxiv.org/abs/2503.10291
- 5%~10% Avg. Accuracy Improvement over 7 mainstream benchmarks.
- This work is released with 400K Tuning Data &amp; 3K Benchmark Problems

207

Haodong Duan Retweeted

Miquel Farré@micuelll · Mar 11

We just added SmolVLM2 to VLMEvalKit - now it is easier to evaluate your fine-tunes 🥰😊

104

4.0K

Haodong Duan Retweeted

ℏ

ℏεsam@Hesamation · Feb 15

we are analyzing the top papers on @huggingface (~4000 papers mostly related to LLMs) and here is a list of the top 20 authors with the most papers published in less than 2 years. all of them Asian! (not necessarily in Asia) this is no competition, these alphas OWN the game.

244

137

16.0K

Haodong Duan@KennyUTC · Jan 28

Lame

AAlexandr Wang@alexandr_wang · Jan 26

DeepSeek is a wake up call for America, but it doesn’t change the strategy: - USA must out-innovate &race faster, as we have done in the entire history of AI - Tighten export controls on chips so that we can maintain future leads Every major breakthrough in AI has been American

236

Haodong Duan@KennyUTC · Dec 28

After 1yr of Building VLMEvalKit now reaches 100+ Contributors On the journey of exploring LMM capabilities, we will go further github.com/open-compass/V…

KennyUTC's tweet image. After 1yr of Building
VLMEvalKit now reaches 100+ Contributors

On the journey of exploring LMM capabilities, we will go further
github.com/open-compass/V…

715

Haodong Duan@KennyUTC · Dec 18

OpenCompass has established a leaderboard to evaluate complex reasoning capability of LMMs, consisting of four advanced multi-modal math reasoning benchmarks. Currently, Gemini-2.0-Flash took the 1st place. DM me to suggest more benchmarks and models to this LB.

KennyUTC's tweet image. OpenCompass has established a leaderboard to evaluate complex reasoning capability of LMMs, consisting of four advanced multi-modal math reasoning benchmarks. Currently, Gemini-2.0-Flash took the 1st place. DM me to suggest more benchmarks and models to this LB.

2.0K

Haodong Duan@KennyUTC · Dec 16

Real Research :lol

JJia-Bin Huang@jbhuang0604 · Dec 16

As my kids are singing APT non-stop these days, I did a bit of reverse engineering of the APT music video and tried to understand why the MV is so addictive. Here is what I learned.

431

Haodong Duan Retweeted

Jiao Sun@sunjiao123sun_ · Dec 14

Mitigating racial bias from LLMs is a lot easier than removing it from humans! Can’t believe this happened at the best AI conference @NeurIPSConf We have ethical reviews for authors, but missed it for invited speakers? 😡

182

817

4.0K

526

2.2M

Haodong Duan@KennyUTC · Dec 6

Same here🤡

JJunyang Lin@JustinLin610 · Dec 5

No visa. Can't go to NeurIPS. ( QwQ

115

Haodong Duan@KennyUTC · Nov 29

I HATE this website

MMichael Anti@mranti · Nov 29

请教各位，如何在谷歌搜索页面中一劳永逸地杜绝“CSDN博客”的结果出现？我真是被烦死了。

210

Haodong Duan Retweeted

Paul Gavrikov@PaulGavrikov · Nov 27

WILD! Some researchers from have republished ResNet under their own names at some predatory journal. @CVPR

509

174

113.0K

Haodong Duan@KennyUTC · Nov 23

The figure is very ``apple''

mmerve@mervenoyann · Nov 22

Apple released AIMv2 🍏 a family of state-of-the-art open-set vision encoders > like CLIP, but add a decoder and train on autoregression 🤯 > 19 open models come in 300M, 600M, 1.2B, 2.7B with resolutions of 224, 336, 448 > Loadable and usable with 🤗 transformers

195

Haodong Duan@KennyUTC · Oct 28

ICLR 2025 incorporated LLM-based review feedback?

484

Haodong Duan@KennyUTC · Oct 25

Ovis demonstrates impressive performance on the Open VLM Leaderboard among the light-weight (< 10B) VLMs. You can check the results here: huggingface.co/spaces/opencom…

TTiezhen WANG@Xianbao_QIAN · Oct 25

Curious about the story behind OVIS? They're doing a broadcast now. Check it out and ask questions :) x.com/i/broadcasts/1…

213