Gedas Bertasius (@gberta227)

Pinned

G

Excited to share our new video-language benchmark for expert-level action analysis! Most existing VLMs struggle significantly with our new benchmark, which requires a precise understanding of nuanced physical human skills. Try your VLM models and let us know how they do!

HHan Yi@Han_Yi_724 · Jun 9

🚀 Introducing ExAct: A Video-Language Benchmark for Expert Action Analysis 🎥 3,521 expert-curated video QA pairs in 6 domains (Sports, Bike Repair, Cooking, Health, Music & Dance). 🧠 GPT‑4o scores 44.70% vs human experts at 82.02%—a huge gap! 📄Paper: arxiv.org/abs/2506.06277

0

5

22

2

1.0K

G

Gedas Bertasius@gberta227 · Jul 17

Checkout our new paper: Video-RTS 🎥 A data-efficient RL method for complex video reasoning tasks. 🔹 Pure RL w/ output-based rewards. 🔹 Novel sparse-to-dense Test-Time Scaling (TTS) to expand input frames via self-consistency. 💥 96.4% less training data! More in the thread👇

ZZiyang Wang@ZiyangW00 · Jul 10

🚨Introducing Video-RTS: Resource-Efficient RL for Video Reasoning with Adaptive Video TTS! While RL-based video reasoning with LLMs has advanced, the reliance on large-scale SFT with extensive video data and long CoT annotations remains a major bottleneck. Video-RTS tackles…

0

6

13

2

1.0K

Gedas Bertasius Retweeted

Z

Ziyang Wang@ZiyangW00 · Jul 10

🚨Introducing Video-RTS: Resource-Efficient RL for Video Reasoning with Adaptive Video TTS! While RL-based video reasoning with LLMs has advanced, the reliance on large-scale SFT with extensive video data and long CoT annotations remains a major bottleneck. Video-RTS tackles…

1

27

40

8

12.0K

Gedas Bertasius Retweeted

M

Mohaiminul (Emon) Islam (on job market)@mmiemon · Jul 3

🚀 On the job market! Final-year PhD @ UNC Chapel Hill working on computer vision, video understanding, multimodal LLMs & AI agents. 2x Research Scientist Intern @Meta 🔍 Seeking Research Scientist/Engineer roles! 🔗 md-mohaiminul.github.io 📧 mmiemon [at] cs [dot] unc [dot] edu

0

4

17

3

1.0K

G

Gedas Bertasius@gberta227 · Jun 20

Great to see our paper ReVisionLLM featured by MCML blog! @gberta227 #CVPR2025

TTanveer Hannan@hannan_tanveer · Jun 20

🚀 Check out our latest work, ReVisionLLM, now featured on the MCML blog! 🔍 A Vision-Language Model for accurate temporal grounding in hour-long videos. 👉 mcml.ai/news/2025-06-2… #VisionLanguage #MultimodalAI #MCML #CVPR2025

0

1

2

0

384

G

Gedas Bertasius@gberta227 · Jun 15

Come to our poster today at #CVPR2025! 🗓️ June 15 | 🕓 4–6PM 📍 Poster #282 | ExHall D 📝 Paper: arxiv.org/abs/2503.09590 🌐 Project: sites.google.com/view/bimba-mllm 💻 Code: github.com/md-mohaiminul/… 🎥 Youtube: youtu.be/YIU2XypsT-o

MMohaiminul (Emon) Islam (on job market)@mmiemon · Mar 20

🚀New #CVPR2025 Paper🚀 Introducing BIMBA, an efficient multimodal LLM for long-range video QA💡 It sets SOTA on 7 VQA benchmarks by intelligently selecting key spatiotemporal tokens utilizing the selective scan mechanism of Mamba models. 🧵Thread below👇 arxiv.org/pdf/2503.09590

0

2

10

0

949

G

Gedas Bertasius@gberta227 · Jun 15

Great to see a lot of interest among the video understanding community about ReVisionLLM! If you missed it, checkout arxiv.org/abs/2411.14901 @hannan_tanveer

MMohaiminul (Emon) Islam (on job market)@mmiemon · Jun 14

Presenting ReVisionLLM at #CVPR2025 today! Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos If you are at CVPR, please stop by 📍 Poster #307, Session 4 🗓️ June 14, 5–7PM | ExHall D 🔗 arxiv.org/pdf/2411.14901 @hannan_tanveer @gberta227

0

2

10

0

633

Gedas Bertasius Retweeted

M

Mohaiminul (Emon) Islam (on job market)@mmiemon · Jun 14

Presenting ReVisionLLM at #CVPR2025 today! Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos If you are at CVPR, please stop by 📍 Poster #307, Session 4 🗓️ June 14, 5–7PM | ExHall D 🔗 arxiv.org/pdf/2411.14901 @hannan_tanveer @gberta227

0

3

7

0

1.0K

G

Gedas Bertasius@gberta227 · Jun 14

Another great accomplishment by Emon this #CVPR2025. Interestingly, rather than using some complex ensemble model, Emon won the EgoSchema challenge by simply applying his latest BIMBA model, which he will also present at the poster session on Sunday 4-6pm. Be sure to stop by!

MMohaiminul (Emon) Islam (on job market)@mmiemon · Jun 14

🚀 Excited to share that we won 1st place at the EgoSchema Challenge at EgoVis, #CVPR2025! Our method (81%) outperformed human accuracy (76.2%) for the first time on this challenging task 🎯 Stop by #CVPR: 📍 Poster #282 | June 15, 4–6PM | ExHall D 🔗 sites.google.com/view/bimba-mllm

1

4

26

0

2.0K

Gedas Bertasius Retweeted

M

Mohaiminul (Emon) Islam (on job market)@mmiemon · Jun 12

Excited to share that our paper Video ReCap (#CVPR2024) won the EgoVis Distinguished Paper Award at #CVPR2025! Honored to see our work recognized and its impact on the video understanding community. Huge thanks to my co-authors and my advisor @gberta227 🔗 sites.google.com/view/vidrecap

1

4

23

0

2.0K

G

Gedas Bertasius@gberta227 · Jun 13

Very proud of this great accomplishment! Congrats @mmiemon! Well deserved!

MMohaiminul (Emon) Islam (on job market)@mmiemon · Jun 12

Excited to share that our paper Video ReCap (#CVPR2024) won the EgoVis Distinguished Paper Award at #CVPR2025! Honored to see our work recognized and its impact on the video understanding community. Huge thanks to my co-authors and my advisor @gberta227 🔗 sites.google.com/view/vidrecap

0

2

24

0

924

Gedas Bertasius Retweeted

R

Roni Sengupta@SenguptRoni · Jun 12

Had a fun time attending @unccs Computer Vision past and present meetup at #CVPR2025 , missing a lot of folks though 😌

1

3

64

4

3.0K

G

Gedas Bertasius@gberta227 · Jun 12

I will be presenting more details on SiLVR at the LOVE: Multimodal Video Agent workshop at 12:15pm CST in Room 105A!

CCe Zhang@cezhhh · Jun 3

Recent advances in test-time optimization have led to remarkable reasoning capabilities in LLMs. However, the reasoning capabilities of MLLMs still significantly lag, especially for complex video-language tasks. We present SiLVR, a Simple Language-based Video Reasoning framework.

0

3

12

0

692

Gedas Bertasius Retweeted

C

Ce Zhang@cezhhh · Jun 3

Recent advances in test-time optimization have led to remarkable reasoning capabilities in LLMs. However, the reasoning capabilities of MLLMs still significantly lag, especially for complex video-language tasks. We present SiLVR, a Simple Language-based Video Reasoning framework.

1

10

27

6

6.0K

G

Gedas Bertasius@gberta227 · Jun 12

Happening today at 1:20pm CST in Rooms 209 A-C!

MMin-Hung (Steve) Chen@CMHungSteven · Jun 10

@CVPR is around the corner!! Join us at the Workshop on T4V at #CVPR2025 with a great speaker lineup (@MikeShou1, @jw2yang4ai, @WenhuChen, @roeiherzig, Yuheng Li, Kristen Grauman) covering diverse topics! Website: sites.google.com/view/t4v-cvpr2… #CVPR #Transformer #Vision #T4V2025 #T4V

0

2

5

0

467

G

Gedas Bertasius@gberta227 · Jun 11

Excited to present VideoTree🌲 at #CVPR2025 Fri at 10:30AM! VideoTree improves long-video QA via smart sampling: -Query-adaptive: finds the parts of the video relevant to the query -Coarse-to-fine structure: structured hierarchically to sample granularly from relevant segments

SShoubin Yu@shoubin621 · May 30, 2024

🚨 Introducing VideoTree! Captioning + LLMs can perform well on long-video QA, but dense frame captioning leads to inefficiency (redundancy) and sub-optimality (irrelevance). VideoTree addresses these issues & improves LLM-based long-video QA by: ▶️ Structured Video…

1

19

36

7

4.0K

Gedas Bertasius Retweeted

M

Min-Hung (Steve) Chen@CMHungSteven · Jun 10

@CVPR is around the corner!! Join us at the Workshop on T4V at #CVPR2025 with a great speaker lineup (@MikeShou1, @jw2yang4ai, @WenhuChen, @roeiherzig, Yuheng Li, Kristen Grauman) covering diverse topics! Website: sites.google.com/view/t4v-cvpr2… #CVPR #Transformer #Vision #T4V2025 #T4V

1

19

45

1

8.0K

Gedas Bertasius Retweeted

A

Avi Schwarzschild@A_v_i__S · Jun 10

Big news! 🎉 I’m joining UNC-Chapel Hill as an Assistant Professor in Computer Science starting next year! Before that, I’ll be spending time @OpenAI working on LLM privacy. @unccs @uncnlp

46

34

577

41

43.0K

G

Gedas Bertasius@gberta227 · Jun 10

Join us for the 4th iteration of Transformers for Vision (T4V) workshop on Thursday!

MMin-Hung (Steve) Chen@CMHungSteven · Jun 10

@CVPR is around the corner!! Join us at the Workshop on T4V at #CVPR2025 with a great speaker lineup (@MikeShou1, @jw2yang4ai, @WenhuChen, @roeiherzig, Yuheng Li, Kristen Grauman) covering diverse topics! Website: sites.google.com/view/t4v-cvpr2… #CVPR #Transformer #Vision #T4V2025 #T4V

0

4

18

0

2.0K

Gedas Bertasius Retweeted

H

Han Yi@Han_Yi_724 · Jun 9

🚀 Introducing ExAct: A Video-Language Benchmark for Expert Action Analysis 🎥 3,521 expert-curated video QA pairs in 6 domains (Sports, Bike Repair, Cooking, Health, Music & Dance). 🧠 GPT‑4o scores 44.70% vs human experts at 82.02%—a huge gap! 📄Paper: arxiv.org/abs/2506.06277

1

5

12

0

2.0K