Zhuang Liu
@liuzhuang1234
Assistant Professor @PrincetonCS. researcher in deep learning, vision, models. previously @MetaAI, @UCBerkeley, @Tsinghua_Uni
I just finished reading “the art of doing science and engineering” by Richard Hamming, here is my honest opinion, no hypes: 1. the books is a niche on a subject that me and many others are interested in, not many books out there written by experts cover it. topics like how to…
Happiness is highly linked to how focused you are on what you're doing. A wandering mind is an unhappy mind.
From GPT to MoE: I reviewed & compared the main LLMs of 2025 in terms of their architectural design from DeepSeek-V3 to Kimi 2. Multi-head Latent Attention, sliding window attention, new Post- & Pre-Norm placements, NoPE, shared-expert MoEs, and more... magazine.sebastianraschka.com/p/the-big-llm-…
Thrilled to be part of such an incredible and talented team! It has been one month since I joined, and I’m inspired every day by our shared mission and commitment. Excited for what’s ahead!
Thinking Machines Lab exists to empower humanity through advancing collaborative general intelligence. We're building multimodal AI that works with how you naturally interact with the world - through conversation, through sight, through the messy way we collaborate. We're…
This has been a recent effort for me as well. The role of q was not easy to reconcile fully. Another thing I found is that the elbo math is much easier if I start the left side with KL
I think I can reasonably claim I do understand the VAE algorithm.
Congrats to @parastooabtahi, @tri_dao and Alex Lombardi on being named 2025 Google Research Scholars. 🎉 The @googleresearch scholars program funds world-class research conducted by early-career professors. bit.ly/4kvpvFx
Check out @zeng_boya 's nicely made short video on this! youtube.com/watch?v=3OGGjh…
Can diffusion models appear to be learning, when they’re actually just memorizing the training data? We show and investigate this phenomenon in the context of neural network weight generation, in our recent paper “Generative Modeling of Weights: Generalization or Memorization?"
All slides from the #cvpr2025 (@CVPR) workshop "How to Stand Out in the Crowd?" are now available on our website: sites.google.com/view/standoutc…
In this #CVPR2025 edition of our community-building workshop series, we focus on supporting the growth of early-career researchers. Join us tomorrow (Jun 11) at 12:45 PM in Room 209 Schedule: sites.google.com/view/standoutc… We have an exciting lineup of invited talks and candid…
This is the syllabus of the course @geoffreyhinton and I taught in 1998 at the Gatsby Unit (just after it was founded). Notice anything?
Generative models are rapidly expanding into new domains. We find that when generating NN weights they often *primarily* memorize training data rather than create novel ones. I’m curious to further explore how properties of data impact generative models—always love to discuss! :)
Can diffusion models appear to be learning, when they’re actually just memorizing the training data? We show and investigate this phenomenon in the context of neural network weight generation, in our recent paper “Generative Modeling of Weights: Generalization or Memorization?"
Check out our new paper “Generative Modeling of Weights: Generalization or Memorization?” — we find that current diffusion-based neural network weight generators often memorize training checkpoints rather than learning a truly generalizable weight distribution!
Can diffusion models appear to be learning, when they’re actually just memorizing the training data? We show and investigate this phenomenon in the context of neural network weight generation, in our recent paper “Generative Modeling of Weights: Generalization or Memorization?"
What would a World Model look like if we start from a real embodied agent acting in the real world? It has to have: 1) A real, physically grounded and complex action space—not just abstract control signals. 2) Diverse, real-life scenarios and activities. Or in short: It has to…