Felix Juefei Xu
@felixudr
Research Scientist @Meta Superintelligence Labs | Robust, Efficient, Multimodal GenAI | PhD @CarnegieMellon | Views are my own.
This is HUGE! Congrats Zhiding and the Eagle 2.5 team!
And today we have just opened sourced the Eagle 2.5 model huggingface.co/nvidia/Eagle2.… You are welcome to download and give a try! We will also open source the fine-tuning code for Eagle 2/2.5 soon at github.com/NVlabs/Eagle. Stay tuned.
🫡 Well said, Saining! This is a lesson for us all. The community is stronger and better off when we approach these challenges thoughtfully and with integrity. Let’s keep learning and growing together.
Thanks for bringing this to my attention. I honestly wasn’t aware of the situation until the recent posts started going viral. I would never encourage my students to do anything like this—if I were serving as an Area Chair, any paper with this kind of prompt would be…
Very timely Benchmark on cinematographic evals. @cliangyu_
📽️Expert-Level Cinematic Understanding in VLM📽️ #ShotBench: benchmark covering 8 core cinematography dimensions #ShotQA: 70k training dataset #ShotVL: 3B and 7B model surpassing GPT-4o on cinematic understanding - Project: vchitect.github.io/ShotBench-proj… - Code: github.com/Vchitect/ShotB…
metaquery is now open-source — with both the data and code available.
The code and instruction-tuning data for MetaQuery are now open-sourced! Code: github.com/facebookresear… Data: huggingface.co/collections/xc… Two months ago, we released MetaQuery, a minimal training recipe for SOTA unified understanding and generation models. We showed that tuning few…
The code and instruction-tuning data for MetaQuery are now open-sourced! Code: github.com/facebookresear… Data: huggingface.co/collections/xc… Two months ago, we released MetaQuery, a minimal training recipe for SOTA unified understanding and generation models. We showed that tuning few…
We find training unified multimodal understanding and generation models is so easy, you do not need to tune MLLMs at all. MLLM's knowledge/reasoning/in-context learning can be transferred from multimodal understanding (text output) to generation (pixel output) even it is FROZEN!
🎉 Exciting News from the Second Workshop on Efficient and On-Device Generation (EDGE) at CVPR 2025! 🎉 We're thrilled to announce the winners of the paper awards at the CVPR 2025 EDGE workshop, proudly sponsored by PixVerse (@PixVerse_). Huge congratulations to the award…



Cutie of the day 🐶 Spotted this little conference attendee during the poster session. Easily stole the show. #CVPR2025 #PosterSessionPup @CVPR




CVPR 2nd Workshop on Efficient and On-Device Generation (EDGE) Schedule (June 12) 📅 Date: June 12, 2025 | 🕐 Time: 13:00 - 17:00 📍 Room: 208 A 13:00 - 13:10 | Opening remarks & award announcement 13:10 - 13:35 | Ziwei Liu (NTU) | "From Multimodal Generative Models to Dynamic…
