Alex Trevithick

@alextrevith

Research Scientist @NVIDIAAI. PhD @UCSanDiego. 4D Vision, Machine Learning, Generative Models.

Joined October 2020

296Following

548Followers

Pinned

🚀 Introducing SimVS: our new method that simplifies 3D capture! 🎯 3D reconstruction assumes consistency—no dynamics or lighting changes—but reality constantly breaks this assumption. ✨ SimVS takes a set of inconsistent images and makes them consistent with a chosen frame.

262

160

50.0K

Alex Trevithick Retweeted

Jack (in SF) Langerman@jacklangerman · Jun 15

CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models @ChrisWu6080 @RuiqiGao @poolio @alextrevith ChangxiZheng @jon_barron @holynski_

155

9.0K

Alex Trevithick@alextrevith · Jun 14

Poster #60 this afternoon, swing by!

AAlex Trevithick@alextrevith · Dec 11

1.0K

Alex Trevithick@alextrevith · May 23

Interactive looong-context reasoning still has a long way to go. We need progress across all axes: more data, bigger model, and smarter architectures. ∞-THOR is just beginning: generate ∞-len trajectories, run agents online train with feedback and more! Let’s push the limits🚀

PPrithviraj (Raj) Ammanabrolu@rajammanabrolu · May 23

"Foundation" models for embodied agents are all the rage but how to actually do complex looong context reasoning? Can we scale Beyond Needle(s) in the (Embodied) Haystack? ∞-THOR is an infinite len sim framework + guide on (new) architectures/training methods for VLA models

2.0K

Alex Trevithick Retweeted

Hanwen Jiang@hanwenjiang1 · May 2

Supervised learning has held 3D Vision back for too long. Meet RayZer — a self-supervised 3D model trained with zero 3D labels: ❌ No supervision of camera & geometry ✅ Just RGB images And the wild part? RayZer outperforms supervised methods (as 3D labels from COLMAP is noisy)…

394

276

42.0K

Alex Trevithick@alextrevith · Apr 23

After finishing ICCV reviews this year...

6.0K

Alex Trevithick@alextrevith · Apr 2

What's the difference between the oai and google image generators? Giving both of them the same image and prompt "generate this image" Gemini is essentially the identity function whereas oai changes content. Does this indicate continuous encoder for Gemini vs. VQVAE for oai?

alextrevith's tweet image. What's the difference between the oai and google image generators?

Giving both of them the same image and prompt "generate this image" Gemini is essentially the identity function whereas oai changes content.

Does this indicate continuous encoder for Gemini vs. VQVAE for oai?

465

Alex Trevithick Retweeted

Xingyu Chen@RoverXingyu · Apr 1

🦣Easi3R: 4D Reconstruction Without Training! Limited 4D datasets? Take it easy. #Easi3R adapts #DUSt3R for 4D reconstruction by disentangling and repurposing its attention maps → make 4D reconstruction easier than ever! 🔗Page: easi3r.github.io

149

28.0K

Alex Trevithick Retweeted

Stan Szymanowicz@StanSzymanowicz · Mar 19

⚡️ Introducing Bolt3D ⚡️ Bolt3D generates interactive 3D scenes in less than 7 seconds on a single GPU from one or more images. It features a latent diffusion model that *directly* generates 3D Gaussians of seen and unseen regions, without any test time optimization. 🧵👇 (1/9)

533

453

114.0K

Alex Trevithick@alextrevith · Mar 17

Thanks @_akhaliq for sharing our ReCamMaster! ReCamMaster can re-capture existing videos with novel camera trajectories. Project page: jianhongbai.github.io/ReCamMaster/ Paper: huggingface.co/papers/2503.11…

AAK@_akhaliq · Mar 17

ReCamMaster Camera-Controlled Generative Rendering from A Single Video

118

24.0K

Alex Trevithick Retweeted

Jianyuan@jianyuan_wang · Mar 17

Introducing VGGT (CVPR'25), a feedforward Transformer that directly infers all key 3D attributes from one, a few, or hundreds of images, in seconds! No expensive optimization needed, yet delivers SOTA results for: ✅ Camera Pose Estimation ✅ Multi-view Depth Estimation ✅ Dense…

199

3.0K

676

198.0K

Alex Trevithick@alextrevith · Mar 11

As one of the people who popularized the field of diffusion models, I am excited to share something that might be the “beginning of the end” of it. IMM has a single stable training stage, a single objective, and a single network — all are what make diffusion so popular today.

LLuma AI@LumaLabsAI · Mar 11

Today, we release Inductive Moment Matching (IMM): a new pre-training paradigm breaking the algorithmic ceiling of diffusion models. Higher sample quality. 10x more efficient. Single-stage, single network, stable training. Read more: lumalabs.ai/news/imm

112

915

565

149.0K

Alex Trevithick Retweeted

Jon Barron@jon_barron · Feb 18

I just pushed a new paper to arXiv. I realized that a lot of my previous work on robust losses and nerf-y things was dancing around something simpler: a slight tweak to the classic Box-Cox power transform that makes it much more useful and stable. It's this f(x, λ) here:

262

2.0K

235.0K

Alex Trevithick Retweeted

Ethan Mollick@emollick · Jan 20

The raw chain of thought from DeepSeek is fascinating, really reads like a human thinking out loud. Charming and strange.

451

4.0K

2.0K

525.0K

Alex Trevithick Retweeted

Kwang Moo Yi@kwangmoo_yi · Jan 14

Preprint of the day: Asim et al., "MEt3R: Measuring Multi-View Consistency in Generated Images" -- geometric-rl.mpi-inf.mpg.de/met3r/ Lots of diffusion-based solutions for novel-view synthesis recently, but how good are they? A metric to compare how "3D" they truly are.

109

8.0K

Alex Trevithick Retweeted

Mason Kamb@MasonKamb · Dec 31

Excited to finally share this work w/ @SuryaGanguli. Tl;dr: we find the first closed-form analytical theory that replicates the outputs of the very simplest diffusion models, with median pixel wise r^2 values of 90%+. arxiv.org/abs/2412.20292

154

938

719

149.0K

Alex Trevithick Retweeted

Yang Luo@YangL_7 · Dec 20

Training-free Video Enhancement: Achieved 🎉 Nice work with @oahzxl @shaowenqi126301 @VictorKaiWang1 @VitaGroupUT @YangYou1991 et al. Non-trivial enhancement, training-free, and plug-and-play 🥳 Blog: oahzxl.github.io/Enhance_A_Vide… (🧵1/6)

254

175

45.0K

Alex Trevithick@alextrevith · Dec 20

You know Generative 3D is moving fast when "early methods" were arXived 8 months ago 😂 [41] Realmdreamer: Text-driven 3d scene generation with inpainting and depth diffusion." arXiv:2404.07199, April 10, 2024.

alextrevith's tweet image. You know Generative 3D is moving fast when "early methods" were arXived 8 months ago 😂

[41] Realmdreamer: Text-driven 3d scene generation with inpainting and depth diffusion." arXiv:2404.07199, April 10, 2024.

156

11.0K

Alex Trevithick Retweeted

Riku Murai@rmurai0610 · Dec 16

Introducing MASt3R-SLAM, the first real-time monocular dense SLAM with MASt3R as a foundation. Easy to use like DUSt3R/MASt3R, from an uncalibrated RGB video it recovers accurate, globally consistent poses & a dense map. With @eric_dexheimer*, @AjdDavison (*Equal Contribution)

260

1.0K

964

201.0K