Chris Rockwell

@_crockwell

PhD student in #ComputerVision at @UmichCSE Views are my own.

Ann Arbor, MI

Joined March 2013

643Following

626Followers

Pinned

Chris Rockwell@_crockwell · May 13

Excited to share ☀️Lightspeed⚡, a photorealistic, synthetic dataset with ground truth pose used for benchmarking alongside DynPose-100K! Now available for download: huggingface.co/datasets/nvidi… Paper accepted to #CVPR2025: arxiv.org/abs/2504.17788

CChris Rockwell@_crockwell · Apr 25

Ever wish YouTube had 3D labels? 🚀Introducing🎥DynPose-100K🎥, an Internet-scale collection of diverse videos annotated with camera pose! Applications include camera-controlled video generation🤩and learned dynamic pose estimation😯 Download: huggingface.co/datasets/nvidi…

197

135

30.0K

Chris Rockwell Retweeted

Linyi Jin@jin_linyi · Jun 13

Hello! If you are interested in dynamic 3D or 4D, don't miss the oral session 3A at 9 am on Saturday: @zhengqi_li will be presenting "MegaSaM" I'll be presenting "Stereo4D" and @QianqianWang5 will be presenting "CUT3R"

1.0K

Chris Rockwell Retweeted

Ayush Shrivastava@ayshrv · Jun 13

Excited to share our CVPR 2025 paper on cross-modal space-time correspondence! We present a method to match pixels across different modalities (RGB-Depth, RGB-Thermal, Photo-Sketch, and cross-style images) — trained entirely using unpaired data and self-supervision. Our…

121

8.0K

Chris Rockwell Retweeted

Yiming Dou@_YimingDou · Jun 13

Ever wondered how a scene sounds👂 when you interact👋 with it? Introducing our #CVPR2025 work "Hearing Hands: Generating Sounds from Physical Interactions in 3D Scenes" -- we make 3D scene reconstructions audibly interactive! yimingdou.com/hearing_hands/

7.0K

Chris Rockwell Retweeted

Jeongsoo Park@jespark0 · Jun 13

Can AI image detectors keep up with new fakes? Mostly, no. Existing detectors are trained using a handful of models. But there are thousands in the wild! Our work, Community Forensics, uses 4800+ generators to train detectors that generalize to new fakes. #CVPR2025 🧵 (1/5)

1.0K

Chris Rockwell Retweeted

Daniel Geng@dangengdg · Jun 12

Hello! If you like pretty images and videos and want a rec for CVPR oral session, you should def go to Image/Video Gen, Friday at 9am: I'll be presenting "Motion Prompting" @RyanBurgert will be presenting "Go with the Flow" and @ChangPasca1650 will be presenting "LookingGlass"

5.0K

Chris Rockwell@_crockwell · Apr 25

Cameras are key to modeling our dynamic 3D visual world. Can we unlock the 𝘥𝘺𝘯𝘢𝘮𝘪𝘤 3𝘋 𝘐𝘯𝘵𝘦𝘳𝘯𝘦𝘵?! 🌎 📸 𝗗𝘆𝗻𝗣𝗼𝘀𝗲-𝟭𝟬𝟬𝗞 is our answer! @_crockwell has curated Internet-scale videos with camera pose annotations for you 🤩 Download: huggingface.co/datasets/nvidi…

CChris Rockwell@_crockwell · Apr 25

6.0K

Chris Rockwell@_crockwell · Jan 7

🌌 NVIDIA Cosmos -- our World Foundation Model platform! Super excited to have made core contributions in multiple aspects. Physical AI is key to modeling the universe of worlds 🌎! 75-page tech report 📄: research.nvidia.com/publication/20… Try them out now 😲! github.com/NVIDIA/Cosmos

NNVIDIA AI@NVIDIAAI · Jan 7

Whether you're a researcher or developer, #NVIDIACosmos world foundation models are now openly available under our permissive license to the physical AI community via NGC & @huggingface. 🤗 #CES2025 See how Cosmos is democratizing #physicalAI development: nvda.ws/3DJwE5v

9.0K

Chris Rockwell Retweeted

Yen-Chen Lin@yen_chen_lin · Jan 11

Video generation models exploded onto the scene in 2024, sparked by the release of Sora from OpenAI. I wrote a blog post on key techniques that are used in building large video generation models: yenchenlin.me/blog/2025/01/0…

109

508

488

63.0K

Chris Rockwell Retweeted

Linyi Jin@jin_linyi · Dec 13

Introducing 👀Stereo4D👀 A method for mining 4D from internet stereo videos. It enables large-scale, high-quality, dynamic, *metric* 3D reconstructions, with camera poses and long-term 3D motion trajectories. We used Stereo4D to make a dataset of over 100k real-world 4D scenes.

106

533

282

89.0K

Chris Rockwell Retweeted

Daniel Geng@dangengdg · Dec 4

What happens when you train a video generation model to be conditioned on motion? Turns out you can perform "motion prompting," just like you might prompt an LLM! Doing so enables many different capabilities. Here’s a few examples – check out this thread 🧵 for more results!

147

672

334

93.0K

Chris Rockwell Retweeted

Ayush Shrivastava@ayshrv · Oct 1

We present Global Matching Random Walks, a simple self-supervised approach to the Tracking Any Point (TAP) problem, accepted to #ECCV2024. We train a global matching transformer to find cycle consistent tracks through video via contrastive random walks (CRW).

17.0K

Chris Rockwell Retweeted

Aayan Yadav@ionydv · Aug 4

📢 Introducing our #ECCV2024 work, COCO-ReM (COCO Refined Masks), for more reliable benchmarking of object detectors, crucial for the future of object detection research. Paper: arxiv.org/abs/2403.18819 Code: Website: cocorem.xyz

4.0K

Chris Rockwell Retweeted

Sarah Jabbour@SarahJabbour_ · Jul 22, 2024

📢Presenting 𝐃𝐄𝐏𝐈𝐂𝐓: Diffusion-Enabled Permutation Importance for Image Classification Tasks #ECCV2024 We use permutation importance to compute dataset-level explanations for image classifiers using diffusion models (without access to model parameters or training data!)

4.0K

Chris Rockwell@_crockwell · Jun 20, 2024

Excited to present our #CVPR2024 *Highlight* FAR on Friday at 10:30 a.m, Arch 4A-E Poster #31. Please feel free to stop by! FAR significantly improves correspondence-based methods using end-to-end pose prediction, making it applicable to many SOTA approaches!

CChris Rockwell@_crockwell · Mar 6, 2024

📢 Presenting 𝐅𝐀𝐑: 𝐅𝐥𝐞𝐱𝐢𝐛𝐥𝐞, 𝐀𝐜𝐜𝐮𝐫𝐚𝐭𝐞 𝐚𝐧𝐝 𝐑𝐨𝐛𝐮𝐬𝐭 𝟔𝐃𝐨𝐅 𝐑𝐞𝐥𝐚𝐭𝐢𝐯𝐞 𝐂𝐚𝐦𝐞𝐫𝐚 𝐏𝐨𝐬𝐞 𝐄𝐬𝐭𝐢𝐦𝐚𝐭𝐢𝐨𝐧 #CVPR2024 FAR builds upon complimentary Solver and Learning-Based works yielding accurate *and* robust pose! crockwell.github.io/far/

2.0K

Chris Rockwell Retweeted

Tiange Luo@tiangeluo · May 6, 2024

We've curated a 1-million 3D-Captioning dataset for Objaverse(-XL), correcting 200k potential misalignments in the original Cap3D captions. Our method employs a pre-trained text-to-3D model to rank rendered views and utilizes GPT-4 Vision. Each caption is linked to a point…

32.0K