Ce Zhang
@cezhhh
CS phd student at UNC Chapel Hill.
🚨 New #CVPR2025 Paper 🚨 🏀BASKET: A Large-Scale Dataset for Fine-Grained Basketball Skill Estimation🎥 4,477 hours of videos⏱️ | 32,232 players⛹️ | 20 fine-grained skills🎯 We present a new video dataset for skill estimation with unprecedented scale and diversity! A thread👇
Excited to share that LLoVi is accepted to #EMNLP2024. We will present our work in poster session 12, Nov. 14 (Thu.) 14:00-15:30 ET. Happy to have a chat! Check out our paper at: arxiv.org/pdf/2312.17235 Code: github.com/CeeZh/LLoVi Website: sites.google.com/cs.unc.edu/llo…
First, LLoVi uses a short-term visual captioner to generate textual descriptions of short video clips (0.5-8s in length) densely sampled from a long input video. Afterward, an LLM aggregates the short-term captions to perform long-range temporal reasoning.
(0/7) #ICLR2024 How could LLM benefit video action forecasting? Excited to share our ICLR 2024 paper: AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?
AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos? paper page: huggingface.co/papers/2307.16… Can we better anticipate an actor's future actions (e.g. mix eggs) by knowing what commonly happens after his/her current action (e.g. crack eggs)? What if we…
The 3rd Transformers for Vision workshop will be back at #CVPR2024! We have a great speaker lineup covering diverse Transformer topics! Papers Due: Apr 15. Website: sites.google.com/view/t4v-cvpr24 Organized w/ @_rohitgirdhar_, @ZhidingYu, @giffmana, @gulvarol, @mohitban47 and others!
The code is now publicly available at github.com/CeeZh/LLoVi!
Check out our recent work on long range video understanding using LLMs! Our simple framework, dubbed LLoVi outperforms prior approaches on the new EgoSchema long range videoQA benchmark by 18% (absolute gain). More details 👇