Stan Szymanowicz

@StanSzymanowicz

PhD student @Oxford_VGG | Previously intern @Google @microsoft | @Cambridge_Uni | IYPT 2016 GBR team captain https://github.com/szymanowiczs

United Kingdom

Joined December 2017

340Following

847Followers

Pinned

Stan Szymanowicz@StanSzymanowicz · Mar 19

⚡️ Introducing Bolt3D ⚡️ Bolt3D generates interactive 3D scenes in less than 7 seconds on a single GPU from one or more images. It features a latent diffusion model that *directly* generates 3D Gaussians of seen and unseen regions, without any test time optimization. 🧵👇 (1/9)

533

453

114.0K

Pinned

Stan Szymanowicz@StanSzymanowicz · May 13

Delighted to share what our team has been working on at Google! After working for so long on sparse-view 3D, it's exciting and sobering how large-scale video models yield strong generalization and 3D consistency with minimal inductive biases goo.gle/4ddjJGJ

GGoogle AI@GoogleAI · May 12

To help replicate the intuitive nature of shopping on a screen, learn how we’re using the latest #GenerativeAI models (including Veo) to transform 2D product images into immersive 3D visualizations for Google Shopping — from as few as 3 product images →goo.gle/4ddjJGJ

222

19.0K

Pinned

Stan Szymanowicz Retweeted

Philipp Henzler@philipphenzler · May 12

From as few as 3 photos to an immersive 3D shopping experience! 🤯 For the past couple of years, our team has been diving deep into generative AI (shoutout to Veo!) to transform 2D product images into interactive 3D visualizations. A big thank you to all my amazing teammates…

4.0K

Stan Szymanowicz@StanSzymanowicz · Jul 23

Survey on feed-forward 3D reconstruction: arxiv.org/pdf/2507.14501. Very thorough overview and a great resource. IMO in the future we'll see a lot of intersection of reconstruction and generation, because we often want to recover the full 3D scene but never observe it completely.

6.0K

Stan Szymanowicz@StanSzymanowicz · Jul 21

I find cooking rather delightful, can we have AGI doing the washing up instead?

JJim Fan@DrJimFan · Jul 19

My bar for AGI is far simpler: an AI cooking a nice dinner at anyone’s house for any cuisine. The Physical Turing Test is very likely harder than the Nobel Prize. Moravec’s paradox will continue to haunt us, looming larger and darker, for the decade to come.

455

Stan Szymanowicz@StanSzymanowicz · Jul 17

Hang out with @lukemelas . Best case you join cursor, worst case you meet the kindest smartest dude

LLuke Melas-Kyriazi@lukemelas · Jul 17

Message me if you’re at ICML and want to chat about coding models!

406

Stan Szymanowicz@StanSzymanowicz · Jul 10

Two axes of eval for LLM/VLMs are increasingly quality AND cost. Assuming users also care about them, my take is that google is really well positioned to become the default proprietary model given their success with (almost) free products (yt, maps, search)

StanSzymanowicz's tweet image. Two axes of eval for LLM/VLMs are increasingly quality AND cost. Assuming users also care about them, my take is that google is really well positioned to become the default proprietary model given their success with (almost) free products (yt, maps, search)

403

Stan Szymanowicz Retweeted

Ben Mildenhall@BenMildenhall · Jun 24

demo combining gaussian splat rendering from @sparkjsdev with a collider mesh baked out from the gaussians, using rapier physics in @threejs

314

24.0K

Stan Szymanowicz@StanSzymanowicz · Jun 25

Bolt3D is accepted to @ICCVConference 🥳 see you in Hawaii!

SStan Szymanowicz@StanSzymanowicz · Mar 19

105

9.0K

Stan Szymanowicz@StanSzymanowicz · Jun 13

📣📣📣 Neural Inverse Rendering from Propagating Light 💡 just won Best Student Paper award at #CVPR!!!

AAnagh Malik@anagh_malik · Jun 6

📢📢📢 Neural Inverse Rendering from Propagating Light 💡 Our CVPR Oral introduces the first method for multiview neural inverse rendering from videos of propagating light, unlocking applications such as relighting light propagation videos, geometry estimation, or light…

183

14.0K

Stan Szymanowicz Retweeted

Visual Geometry Group (VGG)@Oxford_VGG · Jun 13

Many Congratulations to @jianyuan_wang, @MinghaoChen23, @n_karaev, Andrea Vedaldi, Christian Rupprecht and @davnov134 for winning the Best Paper Award @CVPR for "VGGT: Visual Geometry Grounded Transformer" 🥇🎉 🙌🙌 #CVPR2025!!!!!!

481

41.0K

Stan Szymanowicz Retweeted

Jonas Kulhanek@jonaskulhanek · May 30

🚀 Just wrapped up my internship at Google with a paper submission! LODGE is a new large-scale level-of-detail 3DGS method, enabling efficient rendering even on mobile devices. Thanks to the amazing team at Google Zurich for making this possible! 📄lodge-gs.github.io

444

250

29.0K

Stan Szymanowicz@StanSzymanowicz · May 6

Very interesting. I trained my version of LVSM couple months ago and thought that the block artefacts were due to some bug in my reimplementation, but RayZer suggests they could have been due to inaccurate camera poses

HHanwen Jiang@hanwenjiang1 · May 2

Supervised learning has held 3D Vision back for too long. Meet RayZer — a self-supervised 3D model trained with zero 3D labels: ❌ No supervision of camera & geometry ✅ Just RGB images And the wild part? RayZer outperforms supervised methods (as 3D labels from COLMAP is noisy)…

5.0K

Stan Szymanowicz Retweeted

Jia-Bin Huang@jbhuang0604 · Jul 5, 2024

Paper summary for ... Stochastic Interpolants, Flow Matching [Lipman et al. 2023], Rectified Flows [Liu et al. 2023], I-Conditional Flow Matching [Tong et al. 2024], Inversion by Direct Iteration [Delbracio and Milanfar 2024], and Iterative α-(de)Blending [Heitz et al. 2023]

150

21.0K

Stan Szymanowicz Retweeted

Philipp Henzler@philipphenzler · Apr 22

On my way to Singapore for #ICLR2025 ! Looking forward to discussing generative video models and how to make them more controllable. We will also be presenting CubeDiff (cubediff.github.io) on Friday afternoon. Stop by and say hi :)

2.0K

Stan Szymanowicz@StanSzymanowicz · Apr 22

check out Sherwin's poster at ICLR this week (and his follow-up at CVPR)!

SSherwin Bahmani@sherwinbahmani · Apr 22

📢Excited to be at #ICLR2025 for our paper: VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control Poster: Thu 3-5:30 PM (#134) Website: snap-research.github.io/vd3d/ Code: github.com/snap-research/… Also check out our #CVPR2025 follow-up AC3D: snap-research.github.io/ac3d/

1.0K

Stan Szymanowicz@StanSzymanowicz · Apr 22

Woah impressive. At a glance, the key seems to be collecting robot across many different environments, mixing it with lab robot data, open-source robot datasets and non-robot web data. Exciting!

PPhysical Intelligence@physical_int · Apr 22

We got a robot to clean up homes that were never seen in its training data! Our new model, π-0.5, aims to tackle open-world generalization. We took our robot into homes that were not in the training data and asked it to clean kitchens and bedrooms. More below⤵️

392

Stan Szymanowicz Retweeted

Sindhu Hegde@SindhuBHegde · Apr 14

Introducing JEGAL👐 JEGAL can match hand gestures with words & phrases in speech/text. By only looking at hand gestures, JEGAL can perform tasks like determining who is speaking, or if a keyword (eg beautiful) is gestured More about our latest research on co-speech gestures 🧵👇

4.0K

Stan Szymanowicz@StanSzymanowicz · Apr 14

Last Friday was my last day at @GoogleAI - very grateful for an amazing experience. I thought I'd wear my propeller hat one last time - the reactions to it were divided between 'fun hat', 'congrats on your first week' and 'I can't take you seriously when you're wearing that' 😅

StanSzymanowicz's tweet image. Last Friday was my last day at @GoogleAI - very grateful for an amazing experience. I thought I'd wear my propeller hat one last time - the reactions to it were divided between 'fun hat', 'congrats on your first week' and 'I can't take you seriously when you're wearing that' 😅

5.0K

Stan Szymanowicz@StanSzymanowicz · Apr 9

I find it cool that llama4 architecture builds on the sparse mixture-of-experts architecture from almost 10 years ago (!) 2017 arxiv.org/pdf/1701.06538. Old papers for the win

StanSzymanowicz's tweet image. I find it cool that llama4 architecture builds on the sparse mixture-of-experts architecture from almost 10 years ago (!) 2017 arxiv.org/pdf/1701.06538. Old papers for the win

713

Stan Szymanowicz@StanSzymanowicz · Apr 5

Wow LVSM code is online - awesome stuff!

HHaian Jin@Haian_Jin · Apr 4

Our paper LVSM has been accepted as an oral presentation at #ICLR2025! See you in Singapore! We’ve just released the code and checkpoints—check it out here: github.com/haian-jin/LVSM.🚀

498