Xueqing Wu
@xueqing_w
NLPer working on vision-language models | PhD student @CS_UCLA | MS @IllinoisCS
Can VLMs improve ๐๐ต๐ฒ๐บ๐๐ฒ๐น๐๐ฒ๐๐ช? We propose๐ฅ๐ฉ๐๐ฆ๐๐ข, a benchmark to evaluate VLMsโ ๐ฐ๐ฟ๐ถ๐๐ถ๐พ๐๐ฒ and ๐ฐ๐ผ๐ฟ๐ฟ๐ฒ๐ฐ๐๐ถ๐ผ๐ป capabilities, towards the higher goal of VLMs autonomous self-improvement. ๐Project: visco-benchmark.github.io ๐Paper: arxiv.org/abs/2412.02172


๐ Hello, Kimi K2! Open-Source Agentic Model! ๐น 1T total / 32B active MoE model ๐น SOTA on SWE Bench Verified, Tau2 & AceBench among open models ๐นStrong in coding and agentic tasks ๐ค Multimodal & thought-mode not supported for now With Kimi K2, advanced agentic intelligenceโฆ
#ICCV2025 Introducing X-Fusion: Introducing New Modality to Frozen Large Language Models It is a novel framework that adapts pretrained LLMs (e.g., LLaMA) to new modalities (e.g., vision) while retaining their language capabilities and world knowledge! ๏ผ1/n๏ผ Project Page:โฆ
Meet Embodied Web Agents that bridge physical-digital realms. Imagine embodied agents that can search for online recipes, shop for ingredients and cook for you. Embodied web agents search internet information for implementing real-world embodied tasks. All data, codes and webโฆ
๐คฉOne last call for poster! Check out for our ๐ฅ ๐๐๐๐๐ ๐ฅ benchmark to have a deeper understanding for ๐๐๐ ๐ฌ๐๐ฅ๐-๐๐ซ๐ข๐ญ๐ข๐ช๐ฎ๐ ๐๐ง๐ ๐ซ๐๐๐ฅ๐๐๐ญ๐ข๐จ๐ง. โฑ๏ธCome visit us at ๐๐ฑ๐๐๐ฅ๐ฅ ๐ #๐๐๐ ๐๐ญ ๐-๐๐ฉ๐ฆ!
Can VLMs improve ๐๐ต๐ฒ๐บ๐๐ฒ๐น๐๐ฒ๐๐ช? We propose๐ฅ๐ฉ๐๐ฆ๐๐ข, a benchmark to evaluate VLMsโ ๐ฐ๐ฟ๐ถ๐๐ถ๐พ๐๐ฒ and ๐ฐ๐ผ๐ฟ๐ฟ๐ฒ๐ฐ๐๐ถ๐ผ๐ป capabilities, towards the higher goal of VLMs autonomous self-improvement. ๐Project: visco-benchmark.github.io ๐Paper: arxiv.org/abs/2412.02172
๐จ New work: LLMs still struggle at Event Detection due to poor long-context reasoning and inability to follow task constraints, causing precision and recall errors. We introduce DiCoRe โ a lightweight 3-stage Divergent-Convergent reasoning framework to fix this.๐งต๐ท (1/N)
๐ฑCorrection: our poster is ๐ฐ-๐ฒ๐ฝ๐บ, Friday ExHall D #396. Welcome to drop by!
Attending my first CV conference ๐๐ฏ๐๐ซ as an NLPer! So excited to connect with more people! Check out our๐ฅ๐๐๐๐๐๐ฅbenchmark for VLM self-critique and correction at Poster #396, Friday, 2-4pm. We're also presenting at BEAM workshop on Wednesday: beam-workshop2025.github.io
Attending my first CV conference ๐๐ฏ๐๐ซ as an NLPer! So excited to connect with more people! Check out our๐ฅ๐๐๐๐๐๐ฅbenchmark for VLM self-critique and correction at Poster #396, Friday, 2-4pm. We're also presenting at BEAM workshop on Wednesday: beam-workshop2025.github.io
Can VLMs improve ๐๐ต๐ฒ๐บ๐๐ฒ๐น๐๐ฒ๐๐ช? We propose๐ฅ๐ฉ๐๐ฆ๐๐ข, a benchmark to evaluate VLMsโ ๐ฐ๐ฟ๐ถ๐๐ถ๐พ๐๐ฒ and ๐ฐ๐ผ๐ฟ๐ฟ๐ฒ๐ฐ๐๐ถ๐ผ๐ป capabilities, towards the higher goal of VLMs autonomous self-improvement. ๐Project: visco-benchmark.github.io ๐Paper: arxiv.org/abs/2412.02172
๐ข(1/11)Diffusion LMs are fast and controllable at inference time! But why restrict such benefits for processing text data? We are excited to announce LaViDa, one of the first and fastest large diffusion LM for vision-language understanding!!
๐How culturally safe are large vision-language models? ๐LVLMs often miss the mark. We introduce CROSS, a benchmark of 1,284 image-query pairs across 16 countries & 14 languages, revealing how LVLMs violate cultural norms in context. โ๏ธ Evaluation via CROSS-EVAL ๐งจ Safetyโฆ
๐จ New Blog Drop! ๐ "Reflection on Knowledge Editing: Charting the Next Steps" is live! ๐ก Ever wondered why knowledge editing in LLMs still feels more like a lab experiment than a real-world solution? In this post, we dive deep into where the research is thriving โ and whereโฆ
Attending NAACL to present BRIEF (Friday 11am, hall 3) and Self-Routing RAG (KnowledgeNLP Workshop). Looking forward to meeting new and old friends!
#GPT4o image generation brings synthetic visual data quality to the next level. ๐ผ๏ธ ๐คIs synthetic visual data finally ready to be used for improving VLMs? ๐ We show success with CoDA, using contrastive visual data augmentation to help teach VLMs novel and confusing concepts.
๐ฎ Computer Use Agent Arena is LIVE! ๐ ๐ฅ Easiest way to test computer-use agents in the wild without any setup ๐ Compare top VLMs: OpenAI Operator, Claude 3.7, Gemini 2.5 Pro, Qwen 2.5 vl and more ๐น๏ธ Test agents on 100+ real apps & webs with one-click config ๐ Safe & freeโฆ
๐จ New NLP seminar series alert! ๐จ Check out UCLA NLP Seminar series featuring cutting-edge talks from top researchers in NLP and related areas. Great lineup, timely topics, and open to all (zoom)! ๐ง ๐ฌ ๐ Schedule + details:ย uclanlp.github.io/nlp-seminar/
Introducing Self-Routing RAG, a framework that equips selective retrieval with the ability to (1) route between multiple knowledge sources and (2) fully leverage the parametric knowledge of the LLM itself. Paper: arxiv.org/abs/2504.01018 (1/N)
๐ขScaling test-time compute via generative verification (GenRM) is an emerging paradigm and shown to be more efficient than self-consistency (SC) for reasoning. But, such claims are misleadingโ ๏ธ Our compute-matched analysis shows that SC outperforms GenRM across most budgets! ๐งต
๐Excited to share our latest work: OpenVLThinker, an exploration into enhancing vision-language models with R1 reasoning capabilities. By iterative integration of SFT and RL, we enabled LVLMs to exhibit robust R1 reasoning behavior. As a result, OpenVLThinker achieves a 70.2%โฆ
Check out our latest work on knowledge editing for multi-hop reasoning! Paper: arxiv.org/pdf/2503.16356 Code: github.com/zjunlp/CaKE
๐ฐ Introducing CaKE: Circuit-aware Knowledge Editing for LLMs! ๐ Current knowledge editing methods update single facts but struggle with multi-hop reasoning. We propose CaKE to solve this by aligning edits with the model's reasoning pathways, enabling accurate and consistentโฆ
Video generative models hold the promise of being general-purpose simulators of the physical world ๐ค How far are we from this goalโ ๐ขExcited to announce VideoPhy-2, the next edition in the series to test the physical likeness of the generated videos for real-world actions. ๐งต
๐New findings of knowledge overshadowing! Why do LLMs hallucinate over all true training data? ๐คCan we predict hallucinations even before model training or inference? ๐Check out our new preprint: [arxiv.org/pdf/2502.16143] The Law of Knowledge Overshadowing: Towardsโฆ