Cihang Xie
@cihangxie
Assistant Professor, @BaskinEng; PhD, @JHUCompSci; @Facebook Fellowship Recipient; 🐱
OpenVision is accepted by #ICCV2025 🥳🥳 Additionally, stay tuned for v2, arriving very soon with even greater efficiency and capability.
Still relying on OpenAI’s CLIP — a model released 4 years ago with limited architecture configurations — for your Multimodal LLMs? 🚧 We’re excited to announce OpenVision: a fully open, cost-effective family of advanced vision encoders that match or surpass OpenAI’s CLIP and…
I’ll be at @CVPR until June 14 — ping me if you’re around for a coffee chat! ☕️✨ Lots of exciting work ahead 🚀 📅 June 11 🩺 Medical Vision Foundation Model Workshop ⏰ 8:30 AM – 12:00 PM | 📍 Room 212 🔗 homepage: fmv-cvpr25workshop.github.io 📅 June 13 🎨 Generative Image Layer…
And today we have just opened sourced the Eagle 2.5 model huggingface.co/nvidia/Eagle2.… You are welcome to download and give a try! We will also open source the fine-tuning code for Eagle 2/2.5 soon at github.com/NVlabs/Eagle. Stay tuned.
I did not notice this until just now. Thank you @andimarafioti for the recommendation! Very glad that even though Eagle 2 is not our latest work, people still find it very useful.
Just arrived at #ICML25! If you're interested in multimodality, reasoning, and safety — let's connect. Additionally, I'll also be presenting two papers: 📄 What If We Recaption Billions of Web Images with LLaMA-3? 🗓 Tue, Jul 15 | 11 AM – 1:30 PM PDT 📍 East Hall A-B (#E-3305)…

1/6 Introduce MoCa, a new method for continual pre-training of multimodal embeddings! 🚀 MoCa is the first to effectively scale with unlabeled interleaved image-text data, marking a paradigm shift in multimodal embeddings. Paper, code, & checkpoints! 👇 #AI #Multimodal #ML #NLP
Show-o2 is out! 🤗 1/ further natively support video (image/video share the same 3D VAE encoder), 2/ both 1.5B and 7B models, 3/ a dual-path fusion before LLM to enable unified visual representations for understanding and generation. Open source at github.com/showlab/Show-o 🌟
Community Notes say this was AI-generated — but can you really tell? (at least I cannot😅) Also, I am curious in learning which model can generate this hyper-realistic video 🤣
I can’t explain it but I know it’s an orange cat
Finally, it is accepted by #ICCV2025 🥳🥳
🚀 Introducing VideoLLaMB, our latest video understanding framework! Specifically, leveraging our newly developed Memory Bridge Layers, VideoLLaMB can encode 100% of video content without discarding critical visual cues. Empirically, VideoLLaMB attains state-of-the-art…
Coming to SF to support my student @HaoqinT’s talk at the @arizeai Observe event Good job 👍

Thank you @GoogleResearch for supporting our healthcare research! Honored to be one of the Google Research Scholars this year!
We’re announcing the 87 professors selected for the 2025 Google Research Scholar Program — join us in congratulating these exceptional recipients and learn more about their groundbreaking work at goo.gle/rs-recipients. #GoogleResearch #GoogleResearchScholar
I’ll be giving a talk on our VLAA-Thinker🤔 at the @arizeai Observe event at @SHACK15sf next Wednesday, swing by to chat about visual-language reasoning models! Always happy to discuss broader ideas around multimodal reasoning and generative models too arize.com/observe-2025/a…
In this earlier post, we believed SFT would be crucial for multimodal reasoning models, thus releasing the VL-Thinking dataset to facilitate research in this direction. However, our recent findings show a surprising shift: SFT can hinder learning, often inducing…
We are at the Johns Hopkins booth at @CVPR . Come join us 😁 @JHUCompSci @HopkinsEngineer @HopkinsDSAI
🚀 Excited to introduce SimWorld: an embodied simulator for infinite photorealistic world generation 🏙️ populated with diverse agents 🤖 If you are at #CVPR2025, come check out the live demo 👇 Jun 14, 12:00-1:00 pm at JHU booth, ExHall B Jun 15, 10:30 am-12:30 pm, #7, ExHall B
Please visit our poster this afternoon. @Jinrui_Yang_ @yuyinzhou_cs and myself will all be there
I will present my poster, "LayerDecomp," at #CVPR2025 Friday afternoon (June 13th). ⏰Time: 4PM - 6PM. 📍Location: #217, Exhibit Hall D. Stop by and say hi! I'd love to chat with you about our research. #CVPR2025 #CVPR25 #ComputerVision #GenAI #AIResearch #DeepLearning
I will present my poster, "LayerDecomp," at #CVPR2025 Friday afternoon (June 13th). ⏰Time: 4PM - 6PM. 📍Location: #217, Exhibit Hall D. Stop by and say hi! I'd love to chat with you about our research. #CVPR2025 #CVPR25 #ComputerVision #GenAI #AIResearch #DeepLearning
🚀 Introducing LayerDecomp: our latest generative framework for image layer decomposition, which can output photorealistic clean backgrounds and high-quality transparent foregrounds, faithfully preserving visual effects like shadows and reflections. Our key contributions include…
Prof. Kather’s talk on AI Agents in Oncology and Cancer Research happening right now in room 212, music city center @CVPR
Prof. Kather @jnkath is presenting AI Agents in Oncology and Cancer Research! Welcome to join us! @CVPR
Submit your extended abstract to our workshop on "Generative Models for Computer Vision" #CVPR2025 @CVPR Authors with accepted CVPR papers are welcome to present their poster as well! Deadline: April 25th We also have an incredible speaker line-up!
💡 New work: You might not need math data to teach models math reasoning. Recent 🔥 RLVR works challenge the need of *labels* of math questions. We find just playing video games, eg. Snake, can boost multimodal reasoning. No math *questions* needed. arxiv.org/abs/2506.08011🧵👇
Arrived #CVPR2025 Looking forward to a great week. Let me know if you’re here and up for a chat!

