Bowen Wen
@bowenwen_me
Senior Research Scientist @NVIDIA, Computer Vision, Robotics | previously@GoogleX, @Meta, @Amazon. Opinions are my own.
📢Time to upgrade your depth camera! Introducing **FoundationStereo**, a foundation model for stereo depth estimation in zero-shot (accepted to CVPR 2025 with full scores) [1/n] Code: github.com/NVlabs/Foundat… Website: nvlabs.github.io/FoundationSter… Paper: arxiv.org/abs/2501.09898
Stereo depth sensing is set to revolutionize 3D perception. Can't wait to see the new innovations and applications that emerge! #3Dperception #computervision #robotics realsenseai.com/news-insights/…
Want a better representation for collision avoidance and grasping from dense clutter? Try out RaySt3R: our new 3D shape completion pipeline from single-view RGBD (led by @BDuisterhof)!
Imagine if robots could fill in the blanks in cluttered scenes. ✨ Enter RaySt3R: a single masked RGB-D image in, complete 3D out. It infers depth, object masks, and confidence for novel views, and merges the predictions into a single point cloud. rayst3r.github.io
More progress on developing a straightforward method to collect first-person (ego) and third-person (exo) data for robotic training with @rerundotio . I’ve been using the HO-cap dataset to establish a baseline, and here are some updates I’ve made: * added in MANO parameters from…
After a short detour to Mast3r SLAM, I’m starting back up on exo-ego data collection, this time bringing in the HOCap dataset (irvlutd.github.io/HOCap/). It has a permissive license, MANO poses, and RGB-D with camera parameters! I managed to get the camera and images so far.
Why don't you just say "this message is for Chinese researchers"? Besides, I am also amazed by your superpower to recognize the ethnicity of anonymous reviewers. Otherwise, how could one just assume a negative review is from a WeChat user?
Repeat after me: not everybody has a slave army of MSc students that (will do anything to have a paper and) executes every possible (boring) quantitative comparison you can think of. Seems I should post this on wechat though.
Do not miss this great research internship opportunity!
📢📢We have a last-minute internship opening on my team at @NVIDIAAI for this summer. If you are interested and have experience with large feedforward reconstruction models or post-training image/video diffusion models, please get in touch!
Explore a variety of perception models and systems from #NVIDIAResearch that support a unified 3D perception stack for #robotics. These tools enable robots to understand and interact with unfamiliar environments in real-time. 🤖 Learn more 👉 nvda.ws/4jXKSPE
Join us today to learn how to push the boundaries of stereo depth estimation!!
Come and say 👋 tomorrow (06/13) for our oral (1pm, Karl Dean Ballroom) and poster sessions (4pm, ExHall D, #81)! #CVPR2025 @CVPR @CVPRConf @NVIDIAAIDev @NVIDIARobotics #NVIDIA
Had a great PIRA workshop at CVPR, thanks to Vincent Vanhoucke, Kartik Iyer, @bowenwen_me, Kartik Venkataraman, and @tomhodan
Come and say 👋 tomorrow (06/13) for our oral (1pm, Karl Dean Ballroom) and poster sessions (4pm, ExHall D, #81)! #CVPR2025 @CVPR @CVPRConf @NVIDIAAIDev @NVIDIARobotics #NVIDIA
📢Time to upgrade your depth camera! Introducing **FoundationStereo**, a foundation model for stereo depth estimation in zero-shot (accepted to CVPR 2025 with full scores) [1/n] Code: github.com/NVlabs/Foundat… Website: nvlabs.github.io/FoundationSter… Paper: arxiv.org/abs/2501.09898
I use two factors to analyze robot autonomy: environment diversity and task diversity. If a robot just replays data from a single task and environment, of course it’ll succeed. Real autonomy lies in pushing toward the top-right corner of this figure—generalizing both.
Rodney Brooks, robotics legend and iRobot co-founder, just spoke at Stanford about our current AI hype cycle. His blunt take: We're repeating the same mistakes. On hype cycles: We're like "Five-year-olds playing soccer—they all run to the ball. Nothing else is important." This…
Don’t miss this exciting workshop happening tomorrow at #GTCParis.
🤖 Build smarter robots at #GTCParis. Join our new full-day workshop on Tuesday, June 10, to master simulation-first #robotics: 🔧 Structure modular assets 🔌 Test with #ROS 2 📊 Train with synthetic data ⚡ Accelerate AI with NVIDIA GPUs ➡️ nvda.ws/3HiMkhT
Incredible learned behavior (assuming no human intervention) at 48:05 when it failed a couple of times but then it suddenly knows how to make it right. Amazing progress!
Uncut hour-long footage of Figure 02 autonomously transferring and flattening packages for a scanner down the line. The robot is using Figure’s Helix model, a generalist VLA that now incorporates upgrades in temporal memory and force feedback.
Kudos to Aria team and exciting support of FoundationStereo (nvlabs.github.io/FoundationSter…)! High quality 3D human demonstration data collection for robot learning will be a breeze 😌 @NVIDIAAIDev @AIatMeta #xr #technology
Here’s a great breakdown of Aria Gen 2 and why these research glasses are so impressive! 📌 Few highlights: 🕶️ Design & Comfort - Lighter (74–76g) with folding arms and 8 size options for better fit and comfort 📸 Vision Upgrades - 4 HDR cameras (up from 2), now with 120 dB…
Thrilled to be nominated as best paper award candidate!! Looking forward to more chats at CVPR. #CVPR2025
📢Time to upgrade your depth camera! Introducing **FoundationStereo**, a foundation model for stereo depth estimation in zero-shot (accepted to CVPR 2025 with full scores) [1/n] Code: github.com/NVlabs/Foundat… Website: nvlabs.github.io/FoundationSter… Paper: arxiv.org/abs/2501.09898
Come and join us! Also make sure you have signed up for our social event (events.nvidia.com/nvcvprresearch…) and earn free GPU 😍 #CVPR @CVPR
🔎 Explore NVIDIA’s technical workshops at #CVPR2025—dive into volumetric video, 3D point cloud deep learning, and hands-on Kaolin demos. 📝 Plus, discover 60+ papers advancing generative AI, AV, and robotics. Join us in Nashville to push computer vision forward ➡️…
Super cool project! Glad to see FoundationPose (github.com/NVlabs/Foundat…) enables learning from low-cost hand-object demonstrations.
🧑🤖 Introducing Human2Sim2Robot! 💪🦾 Learn robust dexterous manipulation policies from just one human RGB-D video. Our Real→Sim→Real framework crosses the human-robot embodiment gap using RL in simulation. #Robotics #DexterousManipulation #Sim2Real 🧵1/7
Synthetic data generation tools like MimicGen create large sim datasets with ease, but using them in the real-world is difficult due to the large sim-to-real gap. Our new work uses simple co-training to unlock the potential of synthetic sim data for real-world manipulation!
How to use simulation data for real-world robot manipulation? We present sim-and-real co-training, a simple recipe for manipulation. We demonstrate that sim data can significantly enhance real-world performance, even with notable differences between the sim and the real. (1/n)
動画でフレーム間の一貫性が得られるか気になる あと処理速度と人物などまるっこいもの
📢Time to upgrade your depth camera! Introducing **FoundationStereo**, a foundation model for stereo depth estimation in zero-shot (accepted to CVPR 2025 with full scores) [1/n] Code: github.com/NVlabs/Foundat… Website: nvlabs.github.io/FoundationSter… Paper: arxiv.org/abs/2501.09898
I was preparing a video to introduce our lab @IRVLUTD for a meeting. Happy to share the video here! We are looking forward to collaborating with both academia and industry. Please feel free to reach out