Jae Sung Park
@jjaesungPark
🎓 Phd Student at UW advised by Yejin Choi and Ali Farhadi
Having trouble dealing with the excessive token number when processing a video? Check out our paper that is accepted by ICCV 2025 with an average score of 5.5! We tokenize video with tokens grounded in trajectories of all objects rather than fix-sized patches. Trained with a…
Check out our Molmo project in the poster session as well @CVPR!
Molmo won the Best Paper Honorable Mention award @CVPR! This work was a long journey over 1.5 years, from failing to get strong performance with massive scale, low quality data, to focusing on modest scale extremely high quality data! Proud to see what it became. #CVPR2025
Actions in specialized domains have lots of nuances and often appear similar. Can VLMs recognize these nuances in videos? 🎥🤔 Our NeurIPS D&B paper shows Gemini and GPT-4o only score 35% and 45% on complex actions in our benchmark. arxiv.org/abs/2410.05774 🧵 (1/n)
Meet Molmo: a family of open, state-of-the-art multimodal AI models. Our best model outperforms proprietary systems, using 1000x less data. Molmo doesn't just understand multimodal data—it acts on it, enabling rich interactions in both the physical and virtual worlds. Try it…
Announcing Superposed Decoding🦸a decoding method to generate multiple completions in one LM inference pass! Superposed Decoding can power applications from code suggestions to email autocomplete. 📜: arxiv.org/abs/2405.18400 💻: github.com/RAIVNLab/Super… Here’s a quick overview👇
Can machines predict what can happen BEFORE and AFTER the image? Check out "VisualCOMET : Reasoning about the Dynamic Context of a Still Image" @ ECCV20 Spotlight @eccvconf - paper: arxiv.org/abs/2004.10796 - project page: visualcomet.xyz - live QA: 8/24 Mon 8:50 (UTC+1)
