Ziwei Liu
@liuziwei7
Associate Professor @ NTU - Vision, Learning and Graphics.
😃
🧠Video Thinking Test for Reasoning LLMs🧠 *Video Thinking Test* (📽️Video-TT📽️) is a holistic benchmark to assess the advanced reasoning and understanding correctness/robustness between LLMs and humans #ICCV2025 - Project: zhangyuanhan-ai.github.io/video-tt/ - Data: huggingface.co/datasets/lmms-…
x.com/liuziwei7/stat…
🧠Video Thinking Test for Reasoning LLMs🧠 *Video Thinking Test* (📽️Video-TT📽️) is a holistic benchmark to assess the advanced reasoning and understanding correctness/robustness between LLMs and humans #ICCV2025 - Project: zhangyuanhan-ai.github.io/video-tt/ - Data: huggingface.co/datasets/lmms-…
Video LLMs still misread short clips that humans find easy. This paper builds Video‑TT, a 1,000‑video benchmark that hides the usual shortcuts and checks both raw accuracy and how well models survive naturally confusing follow‑up questions. Humans answer correctly 84.3% of the…
12. Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding 🔑 Keywords: Video LLMs, Correctness, Robustness, Video Understanding, Human Intelligence 💡 Category: Computer Vision 🌟 Research Objective: - The study aims to evaluate…
🧠Video Thinking Test for Reasoning LLMs🧠 *Video Thinking Test* (📽️Video-TT📽️) is a holistic benchmark to assess the advanced reasoning and understanding correctness/robustness between LLMs and humans #ICCV2025 - Project: zhangyuanhan-ai.github.io/video-tt/ - Data: huggingface.co/datasets/lmms-…
Morph4Data has also been released now! It provides image pairs with diverse semantics and layouts, valuable for evaluating image morphing techniques.
🔥Tuning-free 2D image morphing🔥 Tired of complex training and strict semantic/layout demands? Meet #FreeMorph #ICCV2025: tuning-free image morphing across diverse situations -Project: yukangcao.github.io/FreeMorph -Paper: arxiv.org/abs/2507.01953 -Code: github.com/yukangcao/Free…
ShotBench: Cinematic Understanding Benchmark - 3,572 expert QA pairs - 3,049 images + 464 videos - 200+ Oscar-nominated films - 8 cinematography dimensions tested
北京时间7月8日晚上8点,南洋理工大学MMLab博士生吴鹏浩,将直播分享《GUI-Reflection:让多模态 GUI 智能体获得反思纠错能力的训练框架》。
🚀Empowering GUI Agents with Self-Reflection Behaviors🚀 🧠GUI-Reflection🧠 is a RL framework that enables end-to-end GUI agents to 1) recognize their own mistakes, 2) undo wrong actions, 3) learn and retry better. - Page: penghao-wu.github.io/GUI_Reflection/ - Code: github.com/penghao-wu/GUI…
One more thing, I just found out that using the #Anycoder feature from @huggingface can easily redesign the project page of #PhysX in a different style! #vibecoding
🌟Physical-Grounded 3D Asset Generation #PhysX is the first physics-grounded 3D generative suites, where #PhysXNet contains 6M objects with physical annotations! - Page: physx-3d.github.io - Code: github.com/ziangcao0312/P… - Data @huggingface: huggingface.co/datasets/Caoza…
🌟Physical-Grounded 3D Asset Generation #PhysX is the first physics-grounded 3D generative suites, where #PhysXNet contains 6M objects with physical annotations! - Page: physx-3d.github.io - Code: github.com/ziangcao0312/P… - Data @huggingface: huggingface.co/datasets/Caoza…
PhysX Physical-Grounded 3D Asset Generation
2. PhysX: Physical-Grounded 3D Asset Generation PhysX: 基于物理的3D资产生成 🔑 关键词: 3D生成模型, 物理属性, PhysXNet, PhysXGen, 生成物理AI 💡 类别: 生成模型 🌟 研究目标: - 通过引入PhysX, 一种生成物理绑定的3D资产的端到端范式,解决3D生成模型中物理属性缺乏的问题。 🛠️…
2. PhysX: Physical-Grounded 3D Asset Generation 🔑 Keywords: 3D generative models, physical properties, PhysXNet, PhysXGen, generative physical AI 💡 Category: Generative Models 🌟 Research Objective: - To address the lack of physical properties in 3D generative models by…
🔥Physical-Grounded 3D Asset Generation🔥 #PhysX is the first physics-grounded 3D framework with *absolute scale*, *material*, *affordance*, *kinematics*, and *function* - Page: physx-3d.github.io - Code: github.com/ziangcao0312/P… - Data @huggingface: huggingface.co/datasets/Caoza…
PhysX Physical-Grounded 3D Asset Generation
🚀 Excited to present our paper ProxyV at #ICML2025! 📍 East Exhibition Hall A-B #E-2608 🗓️ Thu Jul 17, 4:30–7:00 PM PDT Come check out our poster and chat with us!
🧵[1/n] Our #ICML2025 paper, Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM, is now on arXiv! Orthogonal to token reduction approaches, we study the computation-level redundancy on vision tokens within decoder LMM. Paper Link: arxiv.org/abs/2505.15816
✨This AI generates transformation images between two frames FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model
これは期待大!✨ LMMsの検索能力を劇的に変える「MMSearch-R1」が登場です! ✎. FYIG: 大規模言語モデル (LMMs)…
Also, check out our multimodal-SAE paper accepted at ICCV 2025! We'll be refactoring the original code and rolling out new features in this repo over time. 📄 arxiv.org/abs/2411.14982
Tired of hooking SAEs to different models? Check out our new repo for plug-and-play SAE training—now as easy as other PEFT methods supported by huggingface! 🔗 github.com/EvolvingLMMs-L…