Gedas Bertasius
@gberta227
Assistant Professor at @unccs, previously a postdoc at @facebookai, PhD from @Penn, a basketball enthusiast.
Excited to share our new video-language benchmark for expert-level action analysis! Most existing VLMs struggle significantly with our new benchmark, which requires a precise understanding of nuanced physical human skills. Try your VLM models and let us know how they do!
🚀 Introducing ExAct: A Video-Language Benchmark for Expert Action Analysis 🎥 3,521 expert-curated video QA pairs in 6 domains (Sports, Bike Repair, Cooking, Health, Music & Dance). 🧠 GPT‑4o scores 44.70% vs human experts at 82.02%—a huge gap! 📄Paper: arxiv.org/abs/2506.06277
Checkout our new paper: Video-RTS 🎥 A data-efficient RL method for complex video reasoning tasks. 🔹 Pure RL w/ output-based rewards. 🔹 Novel sparse-to-dense Test-Time Scaling (TTS) to expand input frames via self-consistency. 💥 96.4% less training data! More in the thread👇
🚨Introducing Video-RTS: Resource-Efficient RL for Video Reasoning with Adaptive Video TTS! While RL-based video reasoning with LLMs has advanced, the reliance on large-scale SFT with extensive video data and long CoT annotations remains a major bottleneck. Video-RTS tackles…
🚨Introducing Video-RTS: Resource-Efficient RL for Video Reasoning with Adaptive Video TTS! While RL-based video reasoning with LLMs has advanced, the reliance on large-scale SFT with extensive video data and long CoT annotations remains a major bottleneck. Video-RTS tackles…
🚀 On the job market! Final-year PhD @ UNC Chapel Hill working on computer vision, video understanding, multimodal LLMs & AI agents. 2x Research Scientist Intern @Meta 🔍 Seeking Research Scientist/Engineer roles! 🔗 md-mohaiminul.github.io 📧 mmiemon [at] cs [dot] unc [dot] edu
Great to see our paper ReVisionLLM featured by MCML blog! @gberta227 #CVPR2025
🚀 Check out our latest work, ReVisionLLM, now featured on the MCML blog! 🔍 A Vision-Language Model for accurate temporal grounding in hour-long videos. 👉 mcml.ai/news/2025-06-2… #VisionLanguage #MultimodalAI #MCML #CVPR2025
Come to our poster today at #CVPR2025! 🗓️ June 15 | 🕓 4–6PM 📍 Poster #282 | ExHall D 📝 Paper: arxiv.org/abs/2503.09590 🌐 Project: sites.google.com/view/bimba-mllm 💻 Code: github.com/md-mohaiminul/… 🎥 Youtube: youtu.be/YIU2XypsT-o
🚀New #CVPR2025 Paper🚀 Introducing BIMBA, an efficient multimodal LLM for long-range video QA💡 It sets SOTA on 7 VQA benchmarks by intelligently selecting key spatiotemporal tokens utilizing the selective scan mechanism of Mamba models. 🧵Thread below👇 arxiv.org/pdf/2503.09590
Great to see a lot of interest among the video understanding community about ReVisionLLM! If you missed it, checkout arxiv.org/abs/2411.14901 @hannan_tanveer
Presenting ReVisionLLM at #CVPR2025 today! Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos If you are at CVPR, please stop by 📍 Poster #307, Session 4 🗓️ June 14, 5–7PM | ExHall D 🔗 arxiv.org/pdf/2411.14901 @hannan_tanveer @gberta227
Presenting ReVisionLLM at #CVPR2025 today! Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos If you are at CVPR, please stop by 📍 Poster #307, Session 4 🗓️ June 14, 5–7PM | ExHall D 🔗 arxiv.org/pdf/2411.14901 @hannan_tanveer @gberta227
Another great accomplishment by Emon this #CVPR2025. Interestingly, rather than using some complex ensemble model, Emon won the EgoSchema challenge by simply applying his latest BIMBA model, which he will also present at the poster session on Sunday 4-6pm. Be sure to stop by!
🚀 Excited to share that we won 1st place at the EgoSchema Challenge at EgoVis, #CVPR2025! Our method (81%) outperformed human accuracy (76.2%) for the first time on this challenging task 🎯 Stop by #CVPR: 📍 Poster #282 | June 15, 4–6PM | ExHall D 🔗 sites.google.com/view/bimba-mllm
Excited to share that our paper Video ReCap (#CVPR2024) won the EgoVis Distinguished Paper Award at #CVPR2025! Honored to see our work recognized and its impact on the video understanding community. Huge thanks to my co-authors and my advisor @gberta227 🔗 sites.google.com/view/vidrecap
Very proud of this great accomplishment! Congrats @mmiemon! Well deserved!
Excited to share that our paper Video ReCap (#CVPR2024) won the EgoVis Distinguished Paper Award at #CVPR2025! Honored to see our work recognized and its impact on the video understanding community. Huge thanks to my co-authors and my advisor @gberta227 🔗 sites.google.com/view/vidrecap
Had a fun time attending @unccs Computer Vision past and present meetup at #CVPR2025 , missing a lot of folks though 😌
I will be presenting more details on SiLVR at the LOVE: Multimodal Video Agent workshop at 12:15pm CST in Room 105A!
Recent advances in test-time optimization have led to remarkable reasoning capabilities in LLMs. However, the reasoning capabilities of MLLMs still significantly lag, especially for complex video-language tasks. We present SiLVR, a Simple Language-based Video Reasoning framework.
Recent advances in test-time optimization have led to remarkable reasoning capabilities in LLMs. However, the reasoning capabilities of MLLMs still significantly lag, especially for complex video-language tasks. We present SiLVR, a Simple Language-based Video Reasoning framework.
Happening today at 1:20pm CST in Rooms 209 A-C!
@CVPR is around the corner!! Join us at the Workshop on T4V at #CVPR2025 with a great speaker lineup (@MikeShou1, @jw2yang4ai, @WenhuChen, @roeiherzig, Yuheng Li, Kristen Grauman) covering diverse topics! Website: sites.google.com/view/t4v-cvpr2… #CVPR #Transformer #Vision #T4V2025 #T4V
Excited to present VideoTree🌲 at #CVPR2025 Fri at 10:30AM! VideoTree improves long-video QA via smart sampling: -Query-adaptive: finds the parts of the video relevant to the query -Coarse-to-fine structure: structured hierarchically to sample granularly from relevant segments
🚨 Introducing VideoTree! Captioning + LLMs can perform well on long-video QA, but dense frame captioning leads to inefficiency (redundancy) and sub-optimality (irrelevance). VideoTree addresses these issues & improves LLM-based long-video QA by: ▶️ Structured Video…
@CVPR is around the corner!! Join us at the Workshop on T4V at #CVPR2025 with a great speaker lineup (@MikeShou1, @jw2yang4ai, @WenhuChen, @roeiherzig, Yuheng Li, Kristen Grauman) covering diverse topics! Website: sites.google.com/view/t4v-cvpr2… #CVPR #Transformer #Vision #T4V2025 #T4V
Big news! 🎉 I’m joining UNC-Chapel Hill as an Assistant Professor in Computer Science starting next year! Before that, I’ll be spending time @OpenAI working on LLM privacy. @unccs @uncnlp
Join us for the 4th iteration of Transformers for Vision (T4V) workshop on Thursday!
@CVPR is around the corner!! Join us at the Workshop on T4V at #CVPR2025 with a great speaker lineup (@MikeShou1, @jw2yang4ai, @WenhuChen, @roeiherzig, Yuheng Li, Kristen Grauman) covering diverse topics! Website: sites.google.com/view/t4v-cvpr2… #CVPR #Transformer #Vision #T4V2025 #T4V
🚀 Introducing ExAct: A Video-Language Benchmark for Expert Action Analysis 🎥 3,521 expert-curated video QA pairs in 6 domains (Sports, Bike Repair, Cooking, Health, Music & Dance). 🧠 GPT‑4o scores 44.70% vs human experts at 82.02%—a huge gap! 📄Paper: arxiv.org/abs/2506.06277