Chris Rockwell
@_crockwell
PhD student in #ComputerVision at @UmichCSE Views are my own.
Excited to share ☀️Lightspeed⚡, a photorealistic, synthetic dataset with ground truth pose used for benchmarking alongside DynPose-100K! Now available for download: huggingface.co/datasets/nvidi… Paper accepted to #CVPR2025: arxiv.org/abs/2504.17788
Ever wish YouTube had 3D labels? 🚀Introducing🎥DynPose-100K🎥, an Internet-scale collection of diverse videos annotated with camera pose! Applications include camera-controlled video generation🤩and learned dynamic pose estimation😯 Download: huggingface.co/datasets/nvidi…
Hello! If you are interested in dynamic 3D or 4D, don't miss the oral session 3A at 9 am on Saturday: @zhengqi_li will be presenting "MegaSaM" I'll be presenting "Stereo4D" and @QianqianWang5 will be presenting "CUT3R"
Excited to share our CVPR 2025 paper on cross-modal space-time correspondence! We present a method to match pixels across different modalities (RGB-Depth, RGB-Thermal, Photo-Sketch, and cross-style images) — trained entirely using unpaired data and self-supervision. Our…
Ever wondered how a scene sounds👂 when you interact👋 with it? Introducing our #CVPR2025 work "Hearing Hands: Generating Sounds from Physical Interactions in 3D Scenes" -- we make 3D scene reconstructions audibly interactive! yimingdou.com/hearing_hands/
Can AI image detectors keep up with new fakes? Mostly, no. Existing detectors are trained using a handful of models. But there are thousands in the wild! Our work, Community Forensics, uses 4800+ generators to train detectors that generalize to new fakes. #CVPR2025 🧵 (1/5)
Hello! If you like pretty images and videos and want a rec for CVPR oral session, you should def go to Image/Video Gen, Friday at 9am: I'll be presenting "Motion Prompting" @RyanBurgert will be presenting "Go with the Flow" and @ChangPasca1650 will be presenting "LookingGlass"
Cameras are key to modeling our dynamic 3D visual world. Can we unlock the 𝘥𝘺𝘯𝘢𝘮𝘪𝘤 3𝘋 𝘐𝘯𝘵𝘦𝘳𝘯𝘦𝘵?! 🌎 📸 𝗗𝘆𝗻𝗣𝗼𝘀𝗲-𝟭𝟬𝟬𝗞 is our answer! @_crockwell has curated Internet-scale videos with camera pose annotations for you 🤩 Download: huggingface.co/datasets/nvidi…
Ever wish YouTube had 3D labels? 🚀Introducing🎥DynPose-100K🎥, an Internet-scale collection of diverse videos annotated with camera pose! Applications include camera-controlled video generation🤩and learned dynamic pose estimation😯 Download: huggingface.co/datasets/nvidi…
🌌 NVIDIA Cosmos -- our World Foundation Model platform! Super excited to have made core contributions in multiple aspects. Physical AI is key to modeling the universe of worlds 🌎! 75-page tech report 📄: research.nvidia.com/publication/20… Try them out now 😲! github.com/NVIDIA/Cosmos
Whether you're a researcher or developer, #NVIDIACosmos world foundation models are now openly available under our permissive license to the physical AI community via NGC & @huggingface. 🤗 #CES2025 See how Cosmos is democratizing #physicalAI development: nvda.ws/3DJwE5v
Video generation models exploded onto the scene in 2024, sparked by the release of Sora from OpenAI. I wrote a blog post on key techniques that are used in building large video generation models: yenchenlin.me/blog/2025/01/0…
Introducing 👀Stereo4D👀 A method for mining 4D from internet stereo videos. It enables large-scale, high-quality, dynamic, *metric* 3D reconstructions, with camera poses and long-term 3D motion trajectories. We used Stereo4D to make a dataset of over 100k real-world 4D scenes.
What happens when you train a video generation model to be conditioned on motion? Turns out you can perform "motion prompting," just like you might prompt an LLM! Doing so enables many different capabilities. Here’s a few examples – check out this thread 🧵 for more results!
We present Global Matching Random Walks, a simple self-supervised approach to the Tracking Any Point (TAP) problem, accepted to #ECCV2024. We train a global matching transformer to find cycle consistent tracks through video via contrastive random walks (CRW).
📢 Introducing our #ECCV2024 work, COCO-ReM (COCO Refined Masks), for more reliable benchmarking of object detectors, crucial for the future of object detection research. Paper: arxiv.org/abs/2403.18819 Code: Website: cocorem.xyz
📢Presenting 𝐃𝐄𝐏𝐈𝐂𝐓: Diffusion-Enabled Permutation Importance for Image Classification Tasks #ECCV2024 We use permutation importance to compute dataset-level explanations for image classifiers using diffusion models (without access to model parameters or training data!)
Excited to present our #CVPR2024 *Highlight* FAR on Friday at 10:30 a.m, Arch 4A-E Poster #31. Please feel free to stop by! FAR significantly improves correspondence-based methods using end-to-end pose prediction, making it applicable to many SOTA approaches!
📢 Presenting 𝐅𝐀𝐑: 𝐅𝐥𝐞𝐱𝐢𝐛𝐥𝐞, 𝐀𝐜𝐜𝐮𝐫𝐚𝐭𝐞 𝐚𝐧𝐝 𝐑𝐨𝐛𝐮𝐬𝐭 𝟔𝐃𝐨𝐅 𝐑𝐞𝐥𝐚𝐭𝐢𝐯𝐞 𝐂𝐚𝐦𝐞𝐫𝐚 𝐏𝐨𝐬𝐞 𝐄𝐬𝐭𝐢𝐦𝐚𝐭𝐢𝐨𝐧 #CVPR2024 FAR builds upon complimentary Solver and Learning-Based works yielding accurate *and* robust pose! crockwell.github.io/far/
We've curated a 1-million 3D-Captioning dataset for Objaverse(-XL), correcting 200k potential misalignments in the original Cap3D captions. Our method employs a pre-trained text-to-3D model to rank rendered views and utilizes GPT-4 Vision. Each caption is linked to a point…