Stan Szymanowicz
@StanSzymanowicz
PhD student @Oxford_VGG | Previously intern @Google @microsoft | @Cambridge_Uni | IYPT 2016 GBR team captain https://github.com/szymanowiczs
⚡️ Introducing Bolt3D ⚡️ Bolt3D generates interactive 3D scenes in less than 7 seconds on a single GPU from one or more images. It features a latent diffusion model that *directly* generates 3D Gaussians of seen and unseen regions, without any test time optimization. 🧵👇 (1/9)
Delighted to share what our team has been working on at Google! After working for so long on sparse-view 3D, it's exciting and sobering how large-scale video models yield strong generalization and 3D consistency with minimal inductive biases goo.gle/4ddjJGJ
To help replicate the intuitive nature of shopping on a screen, learn how we’re using the latest #GenerativeAI models (including Veo) to transform 2D product images into immersive 3D visualizations for Google Shopping — from as few as 3 product images →goo.gle/4ddjJGJ
From as few as 3 photos to an immersive 3D shopping experience! 🤯 For the past couple of years, our team has been diving deep into generative AI (shoutout to Veo!) to transform 2D product images into interactive 3D visualizations. A big thank you to all my amazing teammates…
Survey on feed-forward 3D reconstruction: arxiv.org/pdf/2507.14501. Very thorough overview and a great resource. IMO in the future we'll see a lot of intersection of reconstruction and generation, because we often want to recover the full 3D scene but never observe it completely.
I find cooking rather delightful, can we have AGI doing the washing up instead?
My bar for AGI is far simpler: an AI cooking a nice dinner at anyone’s house for any cuisine. The Physical Turing Test is very likely harder than the Nobel Prize. Moravec’s paradox will continue to haunt us, looming larger and darker, for the decade to come.
Hang out with @lukemelas . Best case you join cursor, worst case you meet the kindest smartest dude
Message me if you’re at ICML and want to chat about coding models!
Two axes of eval for LLM/VLMs are increasingly quality AND cost. Assuming users also care about them, my take is that google is really well positioned to become the default proprietary model given their success with (almost) free products (yt, maps, search)


demo combining gaussian splat rendering from @sparkjsdev with a collider mesh baked out from the gaussians, using rapier physics in @threejs
Bolt3D is accepted to @ICCVConference 🥳 see you in Hawaii!
⚡️ Introducing Bolt3D ⚡️ Bolt3D generates interactive 3D scenes in less than 7 seconds on a single GPU from one or more images. It features a latent diffusion model that *directly* generates 3D Gaussians of seen and unseen regions, without any test time optimization. 🧵👇 (1/9)
📣📣📣 Neural Inverse Rendering from Propagating Light 💡 just won Best Student Paper award at #CVPR!!!
📢📢📢 Neural Inverse Rendering from Propagating Light 💡 Our CVPR Oral introduces the first method for multiview neural inverse rendering from videos of propagating light, unlocking applications such as relighting light propagation videos, geometry estimation, or light…
Many Congratulations to @jianyuan_wang, @MinghaoChen23, @n_karaev, Andrea Vedaldi, Christian Rupprecht and @davnov134 for winning the Best Paper Award @CVPR for "VGGT: Visual Geometry Grounded Transformer" 🥇🎉 🙌🙌 #CVPR2025!!!!!!
🚀 Just wrapped up my internship at Google with a paper submission! LODGE is a new large-scale level-of-detail 3DGS method, enabling efficient rendering even on mobile devices. Thanks to the amazing team at Google Zurich for making this possible! 📄lodge-gs.github.io
Very interesting. I trained my version of LVSM couple months ago and thought that the block artefacts were due to some bug in my reimplementation, but RayZer suggests they could have been due to inaccurate camera poses
Supervised learning has held 3D Vision back for too long. Meet RayZer — a self-supervised 3D model trained with zero 3D labels: ❌ No supervision of camera & geometry ✅ Just RGB images And the wild part? RayZer outperforms supervised methods (as 3D labels from COLMAP is noisy)…
Paper summary for ... Stochastic Interpolants, Flow Matching [Lipman et al. 2023], Rectified Flows [Liu et al. 2023], I-Conditional Flow Matching [Tong et al. 2024], Inversion by Direct Iteration [Delbracio and Milanfar 2024], and Iterative α-(de)Blending [Heitz et al. 2023]
On my way to Singapore for #ICLR2025 ! Looking forward to discussing generative video models and how to make them more controllable. We will also be presenting CubeDiff (cubediff.github.io) on Friday afternoon. Stop by and say hi :)
check out Sherwin's poster at ICLR this week (and his follow-up at CVPR)!
📢Excited to be at #ICLR2025 for our paper: VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control Poster: Thu 3-5:30 PM (#134) Website: snap-research.github.io/vd3d/ Code: github.com/snap-research/… Also check out our #CVPR2025 follow-up AC3D: snap-research.github.io/ac3d/
Woah impressive. At a glance, the key seems to be collecting robot across many different environments, mixing it with lab robot data, open-source robot datasets and non-robot web data. Exciting!
We got a robot to clean up homes that were never seen in its training data! Our new model, π-0.5, aims to tackle open-world generalization. We took our robot into homes that were not in the training data and asked it to clean kitchens and bedrooms. More below⤵️
Introducing JEGAL👐 JEGAL can match hand gestures with words & phrases in speech/text. By only looking at hand gestures, JEGAL can perform tasks like determining who is speaking, or if a keyword (eg beautiful) is gestured More about our latest research on co-speech gestures 🧵👇
Last Friday was my last day at @GoogleAI - very grateful for an amazing experience. I thought I'd wear my propeller hat one last time - the reactions to it were divided between 'fun hat', 'congrats on your first week' and 'I can't take you seriously when you're wearing that' 😅

I find it cool that llama4 architecture builds on the sparse mixture-of-experts architecture from almost 10 years ago (!) 2017 arxiv.org/pdf/1701.06538. Old papers for the win


Wow LVSM code is online - awesome stuff!
Our paper LVSM has been accepted as an oral presentation at #ICLR2025! See you in Singapore! We’ve just released the code and checkpoints—check it out here: github.com/haian-jin/LVSM.🚀