Xueqing Wu (@xueqing_w)

Pinned

X

Xueqing Wu@xueqing_w · Dec 5

Can VLMs improve 𝘁𝗵𝗲𝗺𝘀𝗲𝗹𝘃𝗲𝘀💪? We propose🔥𝗩𝗜𝗦𝗖𝗢, a benchmark to evaluate VLMs’ 𝗰𝗿𝗶𝘁𝗶𝗾𝘂𝗲 and 𝗰𝗼𝗿𝗿𝗲𝗰𝘁𝗶𝗼𝗻 capabilities, towards the higher goal of VLMs autonomous self-improvement. 🌐Project: visco-benchmark.github.io 📄Paper: arxiv.org/abs/2412.02172

xueqing_w's tweet image. Can VLMs improve 𝘁𝗵𝗲𝗺𝘀𝗲𝗹𝘃𝗲𝘀💪? We propose🔥𝗩𝗜𝗦𝗖𝗢, a benchmark to evaluate VLMs’ 𝗰𝗿𝗶𝘁𝗶𝗾𝘂𝗲 and 𝗰𝗼𝗿𝗿𝗲𝗰𝘁𝗶𝗼𝗻 capabilities, towards the higher goal of VLMs autonomous self-improvement.

🌐Project: visco-benchmark.github.io
📄Paper: arxiv.org/abs/2412.02172

3

36

132

52

20.0K

Xueqing Wu Retweeted

K

Kimi.ai@Kimi_Moonshot · Jul 11

🚀 Hello, Kimi K2! Open-Source Agentic Model! 🔹 1T total / 32B active MoE model 🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models 🔹Strong in coding and agentic tasks 🐤 Multimodal & thought-mode not supported for now With Kimi K2, advanced agentic intelligence…

282

1.0K

7.0K

3.0K

2.5M

Xueqing Wu Retweeted

S

Sicheng Mo@sicheng_mo · Jun 26

#ICCV2025 Introducing X-Fusion: Introducing New Modality to Frozen Large Language Models It is a novel framework that adapts pretrained LLMs (e.g., LLaMA) to new modalities (e.g., vision) while retaining their language capabilities and world knowledge! （1/n） Project Page:…

2

24

81

28

6.0K

Xueqing Wu Retweeted

Y

Yining Hong@yining_hong · Jun 19

Meet Embodied Web Agents that bridge physical-digital realms. Imagine embodied agents that can search for online recipes, shop for ingredients and cook for you. Embodied web agents search internet information for implementing real-world embodied tasks. All data, codes and web…

4

35

160

59

46.0K

X

Xueqing Wu@xueqing_w · Jun 13

🤩One last call for poster! Check out for our 💥 𝐕𝐈𝐒𝐂𝐎 💥 benchmark to have a deeper understanding for 𝐕𝐋𝐌 𝐬𝐞𝐥𝐟-𝐜𝐫𝐢𝐭𝐢𝐪𝐮𝐞 𝐚𝐧𝐝 𝐫𝐞𝐟𝐥𝐞𝐜𝐭𝐢𝐨𝐧. ⏱️Come visit us at 𝐄𝐱𝐇𝐚𝐥𝐥 𝐃 #𝟑𝟗𝟔 𝐚𝐭 𝟒-𝟔𝐩𝐦!

XXueqing Wu@xueqing_w · Dec 5

Can VLMs improve 𝘁𝗵𝗲𝗺𝘀𝗲𝗹𝘃𝗲𝘀💪? We propose🔥𝗩𝗜𝗦𝗖𝗢, a benchmark to evaluate VLMs’ 𝗰𝗿𝗶𝘁𝗶𝗾𝘂𝗲 and 𝗰𝗼𝗿𝗿𝗲𝗰𝘁𝗶𝗼𝗻 capabilities, towards the higher goal of VLMs autonomous self-improvement. 🌐Project: visco-benchmark.github.io 📄Paper: arxiv.org/abs/2412.02172

0

1

10

0

975

Xueqing Wu Retweeted

T

Tanmay Parekh@tparekh97 · Jun 11

🚨 New work: LLMs still struggle at Event Detection due to poor long-context reasoning and inability to follow task constraints, causing precision and recall errors. We introduce DiCoRe — a lightweight 3-stage Divergent-Convergent reasoning framework to fix this.🧵📷 (1/N)

1

18

47

10

4.0K

X

Xueqing Wu@xueqing_w · Jun 12

😱Correction: our poster is 𝟰-𝟲𝗽𝗺, Friday ExHall D #396. Welcome to drop by!

XXueqing Wu@xueqing_w · Jun 9

Attending my first CV conference 𝐞𝐯𝐞𝐫 as an NLPer! So excited to connect with more people! Check out our💥𝐕𝐈𝐒𝐂𝐎💥benchmark for VLM self-critique and correction at Poster #396, Friday, 2-4pm. We're also presenting at BEAM workshop on Wednesday: beam-workshop2025.github.io

0

4

0

266

X

Xueqing Wu@xueqing_w · Jun 9

Attending my first CV conference 𝐞𝐯𝐞𝐫 as an NLPer! So excited to connect with more people! Check out our💥𝐕𝐈𝐒𝐂𝐎💥benchmark for VLM self-critique and correction at Poster #396, Friday, 2-4pm. We're also presenting at BEAM workshop on Wednesday: beam-workshop2025.github.io

XXueqing Wu@xueqing_w · Dec 5

Can VLMs improve 𝘁𝗵𝗲𝗺𝘀𝗲𝗹𝘃𝗲𝘀💪? We propose🔥𝗩𝗜𝗦𝗖𝗢, a benchmark to evaluate VLMs’ 𝗰𝗿𝗶𝘁𝗶𝗾𝘂𝗲 and 𝗰𝗼𝗿𝗿𝗲𝗰𝘁𝗶𝗼𝗻 capabilities, towards the higher goal of VLMs autonomous self-improvement. 🌐Project: visco-benchmark.github.io 📄Paper: arxiv.org/abs/2412.02172

0

11

46

5

4.0K

Xueqing Wu Retweeted

S

Shufan (Jack) Li@li78658171 · May 22

📢(1/11)Diffusion LMs are fast and controllable at inference time! But why restrict such benefits for processing text data? We are excited to announce LaViDa, one of the first and fastest large diffusion LM for vision-language understanding!!

3

40

167

143

31.0K

Xueqing Wu Retweeted

H

Haoyi Qiu@HaoyiQiu · May 22

🌏How culturally safe are large vision-language models? 👉LVLMs often miss the mark. We introduce CROSS, a benchmark of 1,284 image-query pairs across 16 countries & 14 languages, revealing how LVLMs violate cultural norms in context. ⚖️ Evaluation via CROSS-EVAL 🧨 Safety…

5

20

65

18

8.0K

Xueqing Wu Retweeted

Y

Yunzhi Yao@yyzTodd · May 14

🚨 New Blog Drop! 🚀 "Reflection on Knowledge Editing: Charting the Next Steps" is live! 💡 Ever wondered why knowledge editing in LLMs still feels more like a lab experiment than a real-world solution? In this post, we dive deep into where the research is thriving — and where…

0

16

37

8

5.0K

Xueqing Wu Retweeted

D

Di Wu@DiWu0162 · Apr 30

Attending NAACL to present BRIEF (Friday 11am, hall 3) and Self-Routing RAG (KnowledgeNLP Workshop). Looking forward to meeting new and old friends!

0

5

18

0

2.0K

Xueqing Wu Retweeted

Y

Yu (Bryan) Zhou@yu_bryan_zhou · Apr 30

#GPT4o image generation brings synthetic visual data quality to the next level. 🖼️ 🤔Is synthetic visual data finally ready to be used for improving VLMs? 🚀 We show success with CoDA, using contrastive visual data augmentation to help teach VLMs novel and confusing concepts.

1

14

36

3

7.0K

Xueqing Wu Retweeted

B

Bowen Wang@BowenWangNLP · Apr 8

🎮 Computer Use Agent Arena is LIVE! 🚀 🔥 Easiest way to test computer-use agents in the wild without any setup 🌟 Compare top VLMs: OpenAI Operator, Claude 3.7, Gemini 2.5 Pro, Qwen 2.5 vl and more 🕹️ Test agents on 100+ real apps & webs with one-click config 🔒 Safe & free…

14

105

335

210

88.0K

Xueqing Wu Retweeted

u

uclanlp@uclanlp · Apr 4

🚨 New NLP seminar series alert! 🚨 Check out UCLA NLP Seminar series featuring cutting-edge talks from top researchers in NLP and related areas. Great lineup, timely topics, and open to all (zoom)! 🧠💬 📅 Schedule + details: uclanlp.github.io/nlp-seminar/

0

13

58

20

12.0K

Xueqing Wu Retweeted

D

Di Wu@DiWu0162 · Apr 3

Introducing Self-Routing RAG, a framework that equips selective retrieval with the ability to (1) route between multiple knowledge sources and (2) fully leverage the parametric knowledge of the LLM itself. Paper: arxiv.org/abs/2504.01018 (1/N)

2

10

29

7

3.0K

Xueqing Wu Retweeted

H

Hritik Bansal@hbXNov · Apr 2

📢Scaling test-time compute via generative verification (GenRM) is an emerging paradigm and shown to be more efficient than self-consistency (SC) for reasoning. But, such claims are misleading☠️ Our compute-matched analysis shows that SC outperforms GenRM across most budgets! 🧵

2

51

274

251

45.0K

Xueqing Wu Retweeted

Y

Yihe Deng@Yihe__Deng · Mar 21

🚀Excited to share our latest work: OpenVLThinker, an exploration into enhancing vision-language models with R1 reasoning capabilities. By iterative integration of SFT and RL, we enabled LVLMs to exhibit robust R1 reasoning behavior. As a result, OpenVLThinker achieves a 70.2%…

3

36

171

85

21.0K

X

Xueqing Wu@xueqing_w · Mar 24

Check out our latest work on knowledge editing for multi-hop reasoning! Paper: arxiv.org/pdf/2503.16356 Code: github.com/zjunlp/CaKE

NNingyu Zhang@ZJU@zxlzr · Mar 22

🍰 Introducing CaKE: Circuit-aware Knowledge Editing for LLMs! 🚀 Current knowledge editing methods update single facts but struggle with multi-hop reasoning. We propose CaKE to solve this by aligning edits with the model's reasoning pathways, enabling accurate and consistent…

0

2

14

2

1.0K

Xueqing Wu Retweeted

H

Hritik Bansal@hbXNov · Mar 10

Video generative models hold the promise of being general-purpose simulators of the physical world 🤖 How far are we from this goal❓ 📢Excited to announce VideoPhy-2, the next edition in the series to test the physical likeness of the generated videos for real-world actions. 🧵

2

23

56

12

19.0K

Xueqing Wu Retweeted

Y

Yuji Zhang@Yuji_Zhang_NLP · Mar 1

🔍New findings of knowledge overshadowing! Why do LLMs hallucinate over all true training data? 🤔Can we predict hallucinations even before model training or inference? 🚀Check out our new preprint: [arxiv.org/pdf/2502.16143] The Law of Knowledge Overshadowing: Towards…

6

30

117

75

24.0K