Felix Wimbauer @ CVPR2025
@felixwimbauer
PhD Student @tumcvg, Prev Intern @Meta GenAI, MSc CompSci @UniofOxford, 3D Computer Vision, http://fwmb.github.io
Can you train a model for pose estimation directly on casual videos without supervision? Turns out you can! In our #CVPR2025 paper AnyCam, we directly train on YouTube videos and achieve SOTA results by using an uncertainty-based flow loss and monocular priors! ⬇️
We're back this summer with ZurichCV#10 on the 7th of August at the @ETH_AI_Center! Felix Wimbauer (TUM) will talk about 3D scene understanding and Joan Puigcerver (DeepMind) about scaling computer vision architectures. RSVP below.
🦖 We present “Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion”. #ICCV2025 🌍: visinf.github.io/scenedino/ 📃: arxiv.org/abs/2507.06230 💻: github.com/tum-vision/sce… 🤗: huggingface.co/spaces/jev-ale… w/ A. Jevtić @felixwimbauer @olvr_hhn C. Rupprecht @stefanroth D. Cremers
Bolt3D is accepted to @ICCVConference 🥳 see you in Hawaii!
⚡️ Introducing Bolt3D ⚡️ Bolt3D generates interactive 3D scenes in less than 7 seconds on a single GPU from one or more images. It features a latent diffusion model that *directly* generates 3D Gaussians of seen and unseen regions, without any test time optimization. 🧵👇 (1/9)
This Saturday at CVPR, don't miss Oral Session 3A. Vision all-stars @QianqianWang5, @jin_linyi, @zhengqi_li are presenting MegaSaM, CUT3R, and Stereo4D. The posters are right after, and the whole crew will be there. It'll be fun. Drop by.
Excited to be attending CVPR 2025 this week in Nashville! I’ll be presenting our recent work: “4Deform: Neural Surface Deformation for Robust Shape Interpolation” 📍 Poster session: [13th June 4pm-6pm poster #111] #CVPR2025 #computervision
Looking forward to presenting our paper "Finsler multi-dimensional scaling" at #CVPR2025 on Sunday 10:30am, Poster 462! We investigate a largely uncharted research direction in computer vision, aka Finsler manifolds...
Can we match vision and language representations without any supervision or paired data? Surprisingly, yes! Our #CVPR2025 paper with @neekans and Daniel Cremers shows that the pairwise distances in both modalities are often enough to find correspondences. ⬇️1/4
Full draft of the SLAM Handbook now released --- available as a free PDF, with a printed version coming soon. Now including Part 3, "From SLAM to Spatial AI" (I knew it would catch on eventually), with contributions from @HideMatsu82 and me. #SpatialAI github.com/SLAM-Handbook-…
Excited to share ☀️Lightspeed⚡, a photorealistic, synthetic dataset with ground truth pose used for benchmarking alongside DynPose-100K! Now available for download: huggingface.co/datasets/nvidi… Paper accepted to #CVPR2025: arxiv.org/abs/2504.17788
Ever wish YouTube had 3D labels? 🚀Introducing🎥DynPose-100K🎥, an Internet-scale collection of diverse videos annotated with camera pose! Applications include camera-controlled video generation🤩and learned dynamic pose estimation😯 Download: huggingface.co/datasets/nvidi…
Paper of today: Pataki et al., "MP-SfM: Monocular Surface Priors for Robust Structure-from-Motion" -- github.com/cvg/mpsfm Depth estimators work great. Time to integrate them into SfM pipelines.
🚀Excited to announce our @CVPR 2025 paper: Unbiasing through Textual Descriptions! We release UTD-descriptions for 1.9M videos and object-debiased splits for 12 datasets! 🔗Project: utd-project.github.io @NagraniArsha Bernt Schiele @HildeKuehne Christian Rupprecht 🧵👇
Check out how AnyCam is using Rerun viz on their project page 🔥🔥🔥
Check out our recent #CVPR2025 paper AnyCam, a method for pose estimation in casual videos! 1️⃣ Can be directly trained on casual videos without the need for 3D annotation. 2️⃣ Based around a feed-forward transformer and light-weight refinement. ♦️ fwmb.github.io/anycam/
Introducing LoftUp! A stronger (than ever) and lightweight feature upsampler for vision encoders that can boost performance on dense prediction tasks by 20%–100%! Easy to plug into models like DINOv2, CLIP, SigLIP — simple design, big gains. Try it out! github.com/andrehuang/lof…
Back on Track: Bundle Adjustment for Dynamic Scene Reconstruction Weirong Chen, Ganlin Zhang, @felixwimbauer , Rui Wang, @neekans Andrea Vedaldi, Daniel Cremers tl;dr even for non-rigid SfM you can do BA on static parts -> improves everything. arxiv.org/abs/2504.14516
Back on Track: Bundle Adjustment for Dynamic Scene Reconstruction @wrchen530, @zhang_ganlin, @felixwimbauer, Rui Wang, @neekans, Andrea Vedaldi, Daniel Cremers tl;dr: learning-based 3D point tracker decouples camera and object-based motion arxiv.org/abs/2504.14516
Check out our latest #CVPR2025 work AnyCam! Instead of relying on videos with 3D annotations, AnyCam learns pose estimation directly from unlabeled dynamic videos. This is an interesting alternative to methods like Monst3r, which rely on expensive pose labels during training!
Check out our recent #CVPR2025 paper AnyCam, a method for pose estimation in casual videos! 1️⃣ Can be directly trained on casual videos without the need for 3D annotation. 2️⃣ Based around a feed-forward transformer and light-weight refinement. ♦️ fwmb.github.io/anycam/