Weijie Wang
@wjwang2003
Undergraduate student at @ZJU_China, incoming PhD student at ZIP Lab, Zhejiang University
๐ We're excited to introduce ZPressor, a bottleneck-aware compression module for scalable feed-forward 3DGS. Existing feed-forward 3DGS models struggle with dense views, facing performance drops & massive redundancy. ZPressor leverages Information Bottleneck Theory to compressโฆ

With thanks to @janusch_patas for the recommendation. See our homepage for further details and results: aim-uofa.github.io/PMLoss
Revisiting Depth Representations for Feed-Forward 3D Gaussian Splatting โข We pinpoint an unexposed yet critical issue that leads to lower-quality 3D Gaussians predicted by feed-forward 3DGS models, rooted in the long-standing discontinuity issue of depth. โข We introduce aโฆ
4DGT: Learning a 4D Gaussian Transformer Using Real-World Monocular Videos Abstract: We propose 4DGT, a 4D Gaussian-based Transformer model for dynamic scene reconstruction, trained entirely on real-world monocular posed videos. Using 4D Gaussian as an inductive bias, 4DGTโฆ
Revisiting Depth Representations for Feed-Forward 3D Gaussian Splatting @Duochao_Shi, @wjwang2003, @donydchen, Zeyu Zhang, Jia-Wang Bian, @supremeZhuang, @chunhua_shen tl;dr: pre-trained 3D reconstruction models->pointmaps->geometry prior->loss arxiv.org/abs/2506.05327
๐ ๐๐ซ๐ข๐๐ง๐ ๐ฅ๐ ๐๐ฉ๐ฅ๐๐ญ๐ญ๐ข๐ง๐ ๐๐จ๐ซ ๐๐๐๐ฅ-๐๐ข๐ฆ๐ ๐๐๐๐ข๐๐ง๐๐ ๐ ๐ข๐๐ฅ๐ ๐๐๐ง๐๐๐ซ๐ข๐ง๐ is out! We bring triangles back to the spotlight for photorealistic, real-time novel view synthesis. arxiv.org/abs/2505.19175 ๐งต๐
What makes a good 3D scene representation? Instead of meshes or Gaussians, we propose Superquadrics to decompose 3D scenes into extremely compact representations โก๏ธ check out our paper for exciting use-cases in robotics๐ค and GenAI๐ super-dec.github.io w/ @efedele16 @mapo1
Thanks to @zhenjun_zhao for the recommendation. For further information and video results, please visit our project page at lhmd.top/zpressor
ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS @wjwang2003, @donydchen, @SteveZeyuZhang, Duochao Shi, Akide Liu, @supremeZhuang tl;dr: views->anchor & support sets; support view info->anchor views->compressed latent state Z arxiv.org/abs/2505.23734
๐คHow to maintain a long-term memory for a 3D embodied AI agent across dynamic spatial-temporal environment changes in complex tasks? ๐Introducing 3DLLM-Mem, a memory-enhanced 3D embodied agent that incrementally builds and maintains a task-relevant long-term memory while itโฆ
Checkout our recent work on RL for computer use agent! ๐ป๐ฎ
๐ฅIntroducing SPORT, a multimodal agent that explores tool usage without human annotation. It leverages step-wise DPO to further enhance tool-use capabilities following SFT. SPORT achieves improvements on the GTA and GAIA benchmarks. sport-agents.github.io
๐ข (1/16) Introducing PaTH ๐ฃ๏ธ โ a RoPE-free contextualized position encoding scheme, built for stronger state tracking, better extrapolation, and hardware-efficient training. PaTH outperforms RoPE across short and long language modeling benchmarks arxiv.org/abs/2505.16381