Han Wang (@HanWang98)

Pinned

H

Han Wang@HanWang98 · Apr 18

🚨Real-world retrieval is messy: queries can be ambiguous, or documents may conflict/have incorrect/irrelevant info. How can we jointly address all these problems? We introduce: ➡️ RAMDocs, a challenging dataset with ambiguity, misinformation, and noise. ➡️ MADAM-RAG, a…

HanWang98's tweet image. 🚨Real-world retrieval is messy: queries can be ambiguous, or documents may conflict/have incorrect/irrelevant info.
How can we jointly address all these problems?

We introduce:
➡️ RAMDocs, a challenging dataset with ambiguity, misinformation, and noise.
➡️ MADAM-RAG, a…

2

33

69

19

28.0K

H

Han Wang@HanWang98 · Jul 21

🎉 Our paper, GenerationPrograms, which proposes a modular framework for attributable text generation, has been accepted to @COLM_conf! GenerationPrograms produces a program that executes to text, providing an auditable trace of how the text was generated and major gains on…

DDavid Wan@meetdavidwan · Jun 18

Excited to share GenerationPrograms! 🚀 How do we get LLMs to cite their sources? GenerationPrograms is attributable by design, producing a program that executes text w/ a trace of how the text was generated! Gains of up to +39 Attribution F1 and eliminates uncited sentences,…

0

21

37

6

3.0K

H

Han Wang@HanWang98 · Jul 11

🎉 Glad to see our work on handling conflicting & noisy evidence and ambiguous queries in RAG systems (via a new benchmark & multi-agent debate method) has been accepted to #COLM2025 @COLM_conf!! 🇨🇦 Congrats to Han on leading this effort. More details in the thread below and…

HHan Wang@HanWang98 · Apr 18

🚨Real-world retrieval is messy: queries can be ambiguous, or documents may conflict/have incorrect/irrelevant info. How can we jointly address all these problems? We introduce: ➡️ RAMDocs, a challenging dataset with ambiguity, misinformation, and noise. ➡️ MADAM-RAG, a…

0

12

44

3

3.0K

H

Han Wang@HanWang98 · Jul 11

🥳 Excited to share our work -- Retrieval-Augmented Generation with Conflicting Evidence -- on addressing conflict in RAG due to ambiguity, misinformation, and noisy/irrelevant evidence has been accepted to @COLM_conf #COLM2025! Our new benchmark RAMDocs proves challenging for…

HHan Wang@HanWang98 · Apr 18

🚨Real-world retrieval is messy: queries can be ambiguous, or documents may conflict/have incorrect/irrelevant info. How can we jointly address all these problems? We introduce: ➡️ RAMDocs, a challenging dataset with ambiguity, misinformation, and noise. ➡️ MADAM-RAG, a…

0

18

37

3

3.0K

Han Wang Retweeted

Z

Ziyang Wang@ZiyangW00 · Jul 10

🚨Introducing Video-RTS: Resource-Efficient RL for Video Reasoning with Adaptive Video TTS! While RL-based video reasoning with LLMs has advanced, the reliance on large-scale SFT with extensive video data and long CoT annotations remains a major bottleneck. Video-RTS tackles…

1

26

40

8

12.0K

Han Wang Retweeted

h

hyunji amy lee@hyunji_amy_lee · Jul 2

🥳Excited to share that I’ll be joining @unccs as postdoc this fall. Looking forward to work with @mohitban47 & amazing students at @unc_ai_group. I'll continue working on retrieval, aligning knowledge modules with LLM's parametric knowledge, and expanding to various modalities.

20

26

159

7

19.0K

Han Wang Retweeted

J

Jason Ren@RenZhongzheng · Jun 30

🥳 Excited to share that I’ll be joining the CS Department at UNC-Chapel Hill (@unccs @unc_ai_group) as an Assistant Professor starting Fall 2026! Before that, I’ll be working at Ai2 Prior (@allen_ai @Ai2Prior) and UW (@uwcse) on multimodal understanding and generation.

16

14

113

17

9.0K

H

Han Wang@HanWang98 · Jun 27

🎉 Excited to share that CAPTURe has been accepted to #ICCV2025! CAPTURe is a new benchmark for VLM reasoning that requires completing patterns to count objects that are occluded from view. We find that SOTA VLMs struggle with both counting and reasoning about partial patterns!…

EElias Stengel-Eskin@EliasEskin · Apr 23

Check out 🚨CAPTURe🚨 -- a new benchmark and task testing spatial reasoning by making VLMs count objects under occlusion. Key Takeaways: ➡️ SOTA VLMs (GPT-4o, Qwen2-VL, Intern-VL2) have high error rates on CAPTURe (but humans get very low error ✅) and models struggle to reason…

0

14

37

2

5.0K

H

Han Wang@HanWang98 · Jun 26

🎉Excited to announce VEEGIE has been accepted to #ICCV2025 ! VEGGIE is a unified MLLM + Diffusion framework for instructional video editing. It presents a systematic approach spanning data, model, benchmark, and evaluation design, and shows strong multi-skill editing +…

SShoubin Yu@shoubin621 · Mar 19

🚨 Introducing VEGGIE 🥦—a unified, end-to-end, and versatile instructional video generative model. Current video editing methods struggle with: 1. Understanding direct user instructions 2. Handling diverse editing skills in one model 3. balancing multiple training…

0

17

41

3

3.0K

Han Wang Retweeted

S

Shoubin Yu@shoubin621 · Jun 23

New paper Alert 🚨 Introducing MEXA: A general and training-free multimodal reasoning framework via dynamic multi-expert skill selection, aggregation and deep reasoning! MEXA: 1. Selects task- and modality-relevant experts based on the query and various required multimodal…

2

27

71

24

13.0K

Han Wang Retweeted

h

hyunji amy lee@hyunji_amy_lee · Jun 19

🚨 Want models to better utilize and ground on the provided knowledge? We introduce Context-INformed Grounding Supervision (CINGS)! Training LLM with CINGS significantly boosts grounding abilities in both text and vision-language models compared to standard instruction tuning.

2

41

127

45

12.0K

Han Wang Retweeted

D

David Wan@meetdavidwan · Jun 18

Excited to share GenerationPrograms! 🚀 How do we get LLMs to cite their sources? GenerationPrograms is attributable by design, producing a program that executes text w/ a trace of how the text was generated! Gains of up to +39 Attribution F1 and eliminates uncited sentences,…

6

38

94

31

15.0K

H

Han Wang@HanWang98 · Jun 12

Thanks for the discovering + sharing our work on contextualized late-interaction based multimodal content retrieval, Omar! (and ColBERT is awesome of course) 😀

OOmar Khattab@lateinteraction · Jun 12

Wow I missed this extra fancy ColBERT model. > A late-interaction retriever which jointly encodes/contextualizes information from many modalities, allowing for fine-grained matching between the query and implicitly finding the most relevant modality.

1

6

23

0

3.0K

H

Han Wang@HanWang98 · Jun 12

Wow I missed this extra fancy ColBERT model. > A late-interaction retriever which jointly encodes/contextualizes information from many modalities, allowing for fine-grained matching between the query and implicitly finding the most relevant modality.

DDavid Wan@meetdavidwan · Jun 9

Excited to share our new work, CLaMR! 🚀 We tackle multimodal content retrieval by jointly considering video, speech, OCR, and metadata. CLaMR learns to dynamically pick the right modality for your query, boosting retrieval by 25 nDCG@10 over single modality retrieval! 🧐…

1

18

102

77

12.0K

H

Han Wang@HanWang98 · Jun 11

Excited to present VideoTree🌲 at #CVPR2025 Fri at 10:30AM! VideoTree improves long-video QA via smart sampling: -Query-adaptive: finds the parts of the video relevant to the query -Coarse-to-fine structure: structured hierarchically to sample granularly from relevant segments

SShoubin Yu@shoubin621 · May 30, 2024

🚨 Introducing VideoTree! Captioning + LLMs can perform well on long-video QA, but dense frame captioning leads to inefficiency (redundancy) and sub-optimality (irrelevance). VideoTree addresses these issues & improves LLM-based long-video QA by: ▶️ Structured Video…

1

19

36

7

4.0K

H

Han Wang@HanWang98 · Jun 10

Introducing CLaMR -- a late-interaction retriever for complex multimodal video content! 📽️📚 ➡️ Jointly encodes frames, speech, on-screen text, and metadata to answer diverse queries grounded across modalities ➡️ Trained with a new dataset we introduce, MultiVENT 2.0++, a…

DDavid Wan@meetdavidwan · Jun 9

Excited to share our new work, CLaMR! 🚀 We tackle multimodal content retrieval by jointly considering video, speech, OCR, and metadata. CLaMR learns to dynamically pick the right modality for your query, boosting retrieval by 25 nDCG@10 over single modality retrieval! 🧐…

0

9

33

5

2.0K

H

Han Wang@HanWang98 · Jun 9

How can a multimodal retriever accurately retrieve docs from massive online video content that spans multiple modalities? We introduce CLaMR, a contextualized late-interaction retriever that jointly encodes all modalities and dynamically selects those containing the relevant…

DDavid Wan@meetdavidwan · Jun 9

Excited to share our new work, CLaMR! 🚀 We tackle multimodal content retrieval by jointly considering video, speech, OCR, and metadata. CLaMR learns to dynamically pick the right modality for your query, boosting retrieval by 25 nDCG@10 over single modality retrieval! 🧐…

1

9

18

2

2.0K

H

Han Wang@HanWang98 · Jun 9

Excited to announce CLaMR, our new retriever for multimodal documents! Strong performance improvements (+25 nDGC@10) compared to both multimodal and unimodal retrieval baselines. 🤝 CLaMR jointly encodes multiple modalities and selects the most relevant ones for each query. 🏋️‍♂️…

DDavid Wan@meetdavidwan · Jun 9

Excited to share our new work, CLaMR! 🚀 We tackle multimodal content retrieval by jointly considering video, speech, OCR, and metadata. CLaMR learns to dynamically pick the right modality for your query, boosting retrieval by 25 nDCG@10 over single modality retrieval! 🧐…

0

10

22

0

3.0K

Han Wang Retweeted

D

David Wan@meetdavidwan · Jun 9

Excited to share our new work, CLaMR! 🚀 We tackle multimodal content retrieval by jointly considering video, speech, OCR, and metadata. CLaMR learns to dynamically pick the right modality for your query, boosting retrieval by 25 nDCG@10 over single modality retrieval! 🧐…

1

62

184

123

34.0K

Han Wang Retweeted

D

Daeun Lee@danadaeun · Jun 5

Excited to share Video-Skill-CoT🎬🛠️– a new framework for domain-adaptive video reasoning with skill-aware Chain-of-Thought (CoT) supervision! ⚡️Key Highlights: ➡️ Automatically extracts domain-specific reasoning skills from questions and organizes them into a unified taxonomy,…

2

28

76

27

18.0K