Chen Change Loy
@ccloy
President's Chair Professor @NTUsg Director of @MMLabNTU Computer vision and deep learning
🚨 Call for Papers – Deadline Approaching! 📸 MIPI Workshop @ ICCV 2025 📅 Oct 20, 2025 | 📍 Honolulu, Hawai‘i Topics: low-level vision, enhancement, editing, efficient AI, camera systems & more! Don’t miss your chance to present—submit now: 🔗 openreview.net/group?id=thecv……

🔥🆕 ObjectClear is an object removal model that can jointly eliminate the target object and its associated effects (shadow etc) Object Clear app on @Huggingface : huggingface.co/spaces/jixin01…
👁️ ObjectClear: Complete Object Removal via Object-Effect Attention 🧹 Jupyter Notebook 🥳 Thanks to Jixin Zhao ❤ @ShangchenZhou ❤ @ZhouXia1212 ❤ @peiqing001 ❤ @ccloy ❤ 🌐page: zjx0101.github.io/projects/Objec… 🧬code: github.com/zjx0101/Object… 📄paper: arxiv.org/abs/2505.22636…
北京时间7月8日晚上8点,南洋理工大学MMLab博士生吴鹏浩,将直播分享《GUI-Reflection:让多模态 GUI 智能体获得反思纠错能力的训练框架》。
🚀Empowering GUI Agents with Self-Reflection Behaviors🚀 🧠GUI-Reflection🧠 is a RL framework that enables end-to-end GUI agents to 1) recognize their own mistakes, 2) undo wrong actions, 3) learn and retry better. - Page: penghao-wu.github.io/GUI_Reflection/ - Code: github.com/penghao-wu/GUI…
✨速報✨ 古い動画もAI動画もプロ級に生まれ変わるって知ってました?🎥✨ 「#SeedVR2」を使えば、ボヤけた思い出の動画も、AIで作った動画も、魔法のように超高画質になっちゃうんです!😊 👇️ このスレッドで解説していきますね! #AI技術 #動画高画質化 #無料ツール
✨ CVPR 2025 highlight #2 --- EdgeTAM: On-Device Track Anything Model EdgeTAM achieves significant speed-ups of 16 fps on an iPhone 15 Pro Max, ~22x faster than SAM2 on iPhone, while maintaining comparable performance to SAM2 on various video segmentation benchmarks 🔗 paper →…
もはや「そこにあった形跡すら残さない」 Object Clearは、ただモノを消すだけではない。床に落ちる影、ガラスへの映り込み、光の反射…それら全てが一緒に処理される。 昔Photoshopで頑張っていたことをふと思い出す。
One-step diffusion-based video restoration, SeedVR2, by @Iceclearwjy - you can more clearly observe the differences in the comparisons displayed on the project page iceclear.github.io/projects/seedv…
🔥Introducing #SeedVR2, the latest one-step diffusion transformer version of #SeedVR for real-world image and video restoration! details - Paper: arxiv.org/abs/2506.05301 - Project: iceclear.github.io/projects/seedv… - Code (under review): github.com/IceClear/SeedV…
Congratulations to Ziqi and Ziwei! Grateful for the opportunity to work with so many gifted students at @MMLabNTU. Their passion and creativity continue to inspire us! mmlab-ntu.com/team.html
Freshly picked: #NTUsg PhD student Huang Ziqi has been selected as one of 21 global recipients of the prestigious 2025 Apple Scholars in AIML PhD Fellowship — a prestigious programme that supports emerging leaders in AI and machine learning through funding, mentorship, and…
I’ve worked on evals github.com/EvolvingLMMs-L…, and once considered turning it into a business. But here’s my honest take on the future of evaluation: Evaluation should be non-profit and public. People naturally trust evaluations from non-profit organizations more than from…
apropos of the LM arena raise — I'm skeptical of eval startups, because so many failed in previous waves. why? 1) great eval ppl should probably work on post-training instead 2) customers graduate quickly to doing their own evals 3) labs try rly hard to goodhart eval startups
🚀 Just checked out Aero-1-Audio — a compact 1.5B audio model trained in <24h on 16×H100 from @lmmslab. 🎧 Handles 15+ min audio with ease 💡 Outperforms Whisper, Qwen-2-Audio, ElevenLabs on key benchmarks 🧠 Once again, Smart data > raw scale Blog: lmms-lab.com/posts/aero_aud…
🚀 Introducing Aero-1-Audio — a compact yet mighty audio model. ⚡ Trained in <24h on just 16×H100 🎧 Handles 15+ min audio seamlessly 💡 Outperforms bigger models like Whisper, Qwen-2-Audio & commercial services from ElevenLabs/Scribe Aero shows: smart data > massive scale.…
Oh man. So many amazing tools out there. I really gotta start making a bigger portfolio of what I’ve been learning. Using @ComfyUI & tools from @cocktailpeanut - Generate an image with FLUX @bfl_ml to video via @Alibaba_Wan 2.1 Image to Video, to MatAnyone to get a chromakeyed…
🚀 Excited to share the lates work by my student @ChongZhou7 during his internship at @Meta , EdgeTAM: an on-device ‘track anything’ model that brings SAM 2’s video segmentation power to mobile devices. It runs at 16 FPS on iPhone 15 Pro Max, delivering state-of-the-art accuracy…
Apache 2.0 license🔥 On-device deployment ready Extends SAM2 for tracking objects in videos 🔥 Click-to-segment support EdgeTAM by Meta !
Aero-1-Audio is a compact audio model adept at various audio tasks, including speech recognition, audio understanding, and following audio instructions. It is part of the Aero-1 series, the first generation of lightweight multimodal models developed by LMMs-Lab, with future…
Aero-1-Audio is out on Hugging Face Trained in <24h on just 16×H100 Handles 15+ min audio seamlessly Outperforms bigger models like Whisper, Qwen-2-Audio & commercial services from ElevenLabs/Scribe
📢Excited to present our Denoising as Adaptation paper at #ICLR2025: ⏰3:00-5:30PM | Fri, Apr 25 📌Hall 3 + Hall 2B #176 📎 kangliao929.github.io/projects/noise… I'll be around and happy to chat!
Happy to share that our work "Denoising as Adaptation" has been accepted to #ICLR2025! Huge thanks to @ccloy and all collaborators. TL;DR: We propose a novel domain adaptation strategy in the noise space, achieved by taming a specialized diffusion loss to align the distribution.
🚀 Meet Harmon – a unified model for both image generation and understanding! Trained with a shared masked autoregressive encoder, it sets new benchmarks on GenEval & MJHQ30K. 🖼️💬 Try the live demo now on Hugging Face: 👉 huggingface.co/spaces/wusize/… Paper: arxiv.org/abs/2503.21979…
🔥 We release Harmon: a unified framework for multimodal understanding & generation with a shared visual encoder (vs. decoupled Janus/-Pro). 💥 SOTA on GenEval, MJHQ, WISE 🧠 Strong understanding performance 📄 Paper: huggingface.co/papers/2503.21… 🔗 Code: github.com/wusize/Harmon
💥 Consistent Multi-View Diffusion for 3D Enhancement 💥 Introducing our work #3DEnhancer @CVPR: a multi-view diffusion model that enhances multi-view images to improve 3D models. 📰arXiv: arxiv.org/abs/2412.18565 🔥Project: yihangluo.com/projects/3DEnh…
🔥Foundation Models for 3D/4D Motion Capture🔥 We present 📸SMPLest-X📸, the ultimate scaling law for expressive human pose and shape estimation. - Project: caizhongang.com/projects/SMPLe… - Paper: arxiv.org/pdf/2501.09782 - Code: github.com/wqyin/SMPLest-X
🔥 WHAC is here! Code released + WHAC-A-Mole dataset that features dual motions & moving cameras. Powered by SMPLest-X—ultimate scaling to hit data saturation for the first time with 40 SMPL(-X) datasets! 🚀 🔗 wqyin.github.io/projects/WHAC/ 🔗 caizhongang.com/projects/SMPLe…