Juan Carlos Niebles
@jcniebles
Computer Vision. Research Director @Salesforce @SFResearch, Adjunct Professor @Stanford @StanfordSVL.
New blog post: "Are your Visual Programs Right for the Wrong Reasons?" 🤔 Dive into the motivation behind our upcoming @CVPR #CVPR2025 paper! 📰 Blog: niebles.net/blog/2025/viun… ➡️ Project: artemisp.github.io/viunit/ 📄 Paper: arxiv.org/abs/2412.08859 w/ @artemispng & @zhou_honglu
🎉Just Announced: "ViUniT: Visual Unit Tests for More Robust Visual Programming" has been accepted at #CVPR2025! Paper Link: arxiv.org/pdf/2412.08859 Project Page: artemisp.github.io/viunit/ Researcher’s walk-through 👇 In collaboration with @UPenn, we introduce ViUniT, a framework…
🎞️ AI Research Lab - Explained: What Are Small Language Models? Watch now: youtube.com/watch?v=1Rlr2O… While everyone's chasing bigger models, we're proving smaller is smarter. 💡 Our new episode explores Small Language Models (SLMs)—specialized AI that delivers: ⚡ 10x faster…
1/ Model architectures have been mostly treated as fixed post-training. 🌱 Introducing Grafting: A new way to edit pretrained diffusion transformers, allowing us to customize architectural designs on a small compute budget. 🌎 grafting.stanford.edu Co-led with @MichaelPoli6
Great to see our AI Research Lab series continue with @jcniebles breaking down multimodal AI. Cross-modal reasoning is more than impressive - it's how #EnterpriseAI can truly understand the world. Definitely worth a watch:
🚨New Episode Drop!🚨 🧠 AI Research Lab - Explained: The Future is Multimodal You text, share photos, record videos—seamlessly switching between data types. Why can't AI? Our Salesforce AI team builds multimodal systems that understand text, images, audio, and video…
Check out our AI Research Lab - Explained episode on Multimodal AI. Had a blast creating this episode with the team! @SFResearch
🚨New Episode Drop!🚨 🧠 AI Research Lab - Explained: The Future is Multimodal You text, share photos, record videos—seamlessly switching between data types. Why can't AI? Our Salesforce AI team builds multimodal systems that understand text, images, audio, and video…
ViUnit is on! Come learn about our #CVPR2025 Poster 346 #visualprogramming #vq #ai
🚨 Are visual programs actually reasoning correctly? Spoiler: 40% of the time, they get the right answer for the wrong reason. Come check out our #CVPR2025 poster (#346) tomorrow — Sunday, June 15th from 10:30am–12:30pm CDT!
It’s happening today! Come check out our #CVPR2025 poster #346 — Sunday, June 15th from 10:30am–12:30pm CDT. Blog: niebles.net/blog/2025/viun… Arxiv: arxiv.org/abs/2412.08859
🚨 Are visual programs actually reasoning correctly? Spoiler: 40% of the time, they get the right answer for the wrong reason. Come check out our #CVPR2025 poster (#346) tomorrow — Sunday, June 15th from 10:30am–12:30pm CDT!
Coming up soon! Stop by our poster on long video understanding. 🖼️ExHall D Poster #306, Fri Jun13 4-6pm arxiv: arxiv.org/abs/2504.02259
Want to process 1-hour long videos? Welcome to talk with us at ExHall D Poster #306, Fri Jun13 4-6pm @CVPR about temporal searching! T* can plug in any VLMs! Try with your own demos!
Congrats Chaitanya on winning the BEST PAPER AWARD 🥇 🏆 #CVPR2025 Check out details of our work: arxiv.org/abs/2504.12513
Our first poster is up! 🕐Come check it out right now until 13:00 “AdaVid: Adaptive Video-Language Pretraining” 🪧ExHall D Poster # 203 📝arxiv.org/abs/2504.12513
Our first poster is up! 🕐Come check it out right now until 13:00 “AdaVid: Adaptive Video-Language Pretraining” 🪧ExHall D Poster # 203 📝arxiv.org/abs/2504.12513
I'll also be presenting multiple papers at #CVPR2025! First up: "AdaVid: Adaptive Video-Language Pretraining". 🗓️ Thu Jun 12, 12:00-13:00PM 📍 ExHall D Poster #202 🔗 Paper: arxiv.org/abs/2504.12513 🌐 Website: chaitanya100100.github.io/AdaVid/ #VideoLanguage #Pretraining
I'm attending CVPR2025@Nashville this week, we have a few presentations this year, feel free to drop and talk with us🙌 BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset (arxiv.org/abs/2505.09568) @ 6.11, 101B, 2:10pm The 4th Workshop on…
Just finished a day at the #CVPR25 Area Chair workshop. Lots of interesting discussions and ideas, reconnection with colleagues and friends. Had the chance to present our ViUnit poster to fellow ACs. If you missed it, come to our Sunday poster session. See details in the 🧵⬇️
Kicking things off on June 11th by participating in the #CVPR2025 Area Chair workshop! Eager to connect with fellow ACs and colleagues. Let's make this an impactful conference!
Need to try out new diffusion architectures? @keshigeyan ‘s most recent paper explores this with really interesting results. Check it out! ⬇️
1/ Model architectures have been mostly treated as fixed post-training. 🌱 Introducing Grafting: A new way to edit pretrained diffusion transformers, allowing us to customize architectural designs on a small compute budget. 🌎 grafting.stanford.edu Co-led with @MichaelPoli6
For all the folks attending #CVPR2025 @CVPR interested in GenAI for Visual Generation, Editing & Understanding with a human centric (or interdisciplinary) scope. Don’t miss a new edition of our #CVEU workshop 👇🏼
🗓️CVEU Workshop Schedule #CVPR2025 📍All times in Nashville Time 🔗Full program details can be found here: cveu.github.io