Jun-Yan Zhu

@junyanz89

Assistant Professor at Generative Intelligence Lab @CMU_Robotics @CarnegieMellon. Understanding and creating pixels.

Pittsburgh, PA

Joined April 2017

664Following

12KFollowers

Pinned

Jun-Yan Zhu@junyanz89 · May 9

We've released the code for LegoGPT. This autoregressive model generates physically stable and buildable designs from text prompts, by integrating physics laws and assembly constraints into LLM training and inference. This work is led by PhD students @AvaLovelace0, @kangle_deng,…

509

239

37.0K

Jun-Yan Zhu Retweeted

Muyang Li@lmxyy1999 · Jun 30

🚀 #Nunchaku now supports FLUX.1-Kontext-dev! Edit images with just one sentence — style transfer, face swap, and more — now 2–3× faster and using 1/4 VRAM. ✅ Works with ComfyUI & Diffusers 🔗 Demo: svdquant.mit.edu/kontext/ 📂 Code: github.com/mit-han-lab/nu… 🤗 4-bit #SVDQuant…

4.0K

Jun-Yan Zhu Retweeted

Percy Liang@percyliang · Jun 18

Wrapped up Stanford CS336 (Language Models from Scratch), taught with an amazing team @tatsu_hashimoto @marcelroed @neilbband @rckpudi. Researchers are becoming detached from the technical details of how LMs work. In CS336, we try to fix that by having students build everything:

575

5.0K

7.0K

643.0K

Jun-Yan Zhu Retweeted

San Antonio Spurs@spurs · May 2

Thank you, Coach Pop, for your brilliance on and off the court. We look forward to our next chapter together.

513

12.0K

52.0K

2.0K

1.7M

Jun-Yan Zhu@junyanz89 · Apr 24

🚀 How to run 12B FLUX.1 on your local laptop with 2-3× speedup? Come check out our #SVDQuant (#ICLR2025 Spotlight) poster session! 🎉 🗓️ When: Friday, Apr 25, 10–12:30 (Singapore time) 📍 Where: Hall 3 + Hall 2B, Poster 169 📌 Poster: tinyurl.com/poster-svdquant 🎮 Demo:…

MMuyang Li@lmxyy1999 · Nov 8

🚀 The 4-bit era has arrived! Meet #SVDQuant, our new W4A4 quantization paradigm for diffusion models. Now, 12B FLUX can run on a 16GB 4090 laptop without offloading—with 3x speedups over W4A16 models (like NF4) while maintaining top-tier image quality. #AI #Quantization. 1/7

9.0K

Jun-Yan Zhu@junyanz89 · Mar 31

Hi there, @phillip_isola and I wrote a short article (500 words) on Generative Modeling for the Open Encyclopedia of Cognitive Science. We briefly discuss the basic concepts of generative models and their applications. Don't miss out @phillip_isola's hand-drawn cats in Figure 1!

OOpen Encyclopedia of Cognitive Science Bot@oecs_bot · Feb 27

Generative Modeling by Jun-Yan Zhu: doi.org/10.21428/e2759…

13.0K

Jun-Yan Zhu@junyanz89 · Mar 26

I've updated my blog post to walk through the remaining technical details of our Surface Winding Numbers algorithm: now the calculus of the algorithm is explained a bit more in detail. The post, paper, code, etc. is all here: nzfeng.github.io/research/WNoDS…

NNicole Feng@nicolefeng_ · Aug 19, 2023

My SIGGRAPH 2023 presentation of "Winding Numbers on Discrete Surfaces", authored with @MarkGillespie64 and @keenanisalive , is now on YouTube: youtu.be/QnMx3s4_4WY

440

247

35.0K

Jun-Yan Zhu@junyanz89 · Mar 24

Today's visual generative models are mere stochastic parrots of imagery, much like early language models, which could only statistically mimic short sentences with little reasoning. In contrast, modern large language models (LLMs) can comprehend long documents, keep track of…

RReve@reveimage · Mar 24

Halfmoon is Reve Image — and it’s the best image model in the world 🥇 (🔊)

212

30.0K

Jun-Yan Zhu@junyanz89 · Mar 24

Excited to come out of stealth at @reveimage! Today's text-to-image/video models, in contrast to LLMs, lack logic. Images seem plausible initially but fall apart under scrutiny: painting techniques don't match, props don't carry meaning, and compositions lack intention. (1/4)

AArtificial Analysis@ArtificialAnlys · Mar 24

The Halfmoon 🌓 reveal: Congratulations to @reveimage on creating the world’s leading image generation model with Reve Image! Reve Image has been in the Artificial Analysis Image Arena over the past week and is the clear leader, beating strong competition including Recraft V3,…

842

388

149.0K

Jun-Yan Zhu Retweeted

Tinghui Zhou@TinghuiZhou · Mar 20

We shared some early work towards a multi-modal and multi-task 3D foundation model at Roblox. First release is a discrete shape tokenizer compatible with autoregressive modeling for text-to-shape. More to come soon Github: github.com/Roblox/cube Arxiv: arxiv.org/abs/2503.15475

101

11.0K

Jun-Yan Zhu@junyanz89 · Feb 25

Check out our @gradio demo based on @bfl_ml's FLUX model!! We fine-tune the model using our generated dataset to achieve tuning-free customization on new reference objects. huggingface.co/spaces/nupurkm…

NNupur Kumari@nupurkmr9 · Feb 11

Can we generate a training dataset of the same object in different contexts for customization? Check out our work SynCD, which uses Objaverse assets and shared attention in text-to-image models for the same. cs.cmu.edu/~syncd-project/ w/ @xi_yin_ @junyanz89 @imisra_ @smnh_azadi

12.0K

Jun-Yan Zhu Retweeted

Jia-Bin Huang@jbhuang0604 · Feb 14

Holy Crap!! The journal extension of Expressive Image Generation with Rich Text has been accepted to IJCV! This extension expands the capability of rich text by enabling hyperlinks, texture fill, semantic image editing, and a new benchmark (yay, table with numbers)! Congrats…

18.0K

Jun-Yan Zhu Retweeted

Nupur Kumari@nupurkmr9 · Feb 11

20.0K

Jun-Yan Zhu Retweeted

Song Han@songhan_mit · Jan 28

Explore SVDQuant, it's time for 4bit inference: forbes.com/sites/johnwern…

12.0K

Jun-Yan Zhu Retweeted

Joy Hsu@joycjhsu · Jan 21

Excited to bring back the 2nd Workshop on Visual Concepts at @CVPR 2025, this time with a call for papers! We welcome submissions on the following topics. See our website for more info: sites.google.com/stanford.edu/w… Join us & a fantastic lineup of speakers in Tennessee!

135

22.0K

Jun-Yan Zhu Retweeted

Muyang Li@lmxyy1999 · Jan 17

🚀In my last project, I developed a simple interactive WebUI tool, #VisCompare, to compare images/videos side-by-side across different models and methods as in the video. 🌟It's now open-source at github.com/mit-han-lab/Vi…! 🙌Hope it can benefit the community—feedback and…

1.0K

Jun-Yan Zhu Retweeted

Tsai-Shien Chen@tsaishien_chen · Jan 13

Introducing ⚗️ Video Alchemist Our new video model supporting 👪 Multi-subject open-set personalization 🏞️ Foreground & background personalization 🚀 Without the need of inference-time tuning snap-research.github.io/open-set-video… [Results] 1. Sora girl rides a dinosaur on a savanna 🧵👇

231

133

30.0K

Jun-Yan Zhu@junyanz89 · Jan 7

Text prompts have shaped how we compose images with foundation models. But what if we could simply inject Visual Prompts instead? We introduce 🌟Visual Composer🌟 which achieves high-fidelity compositions of subjects and backgrounds with visual prompts! snap-research.github.io/visual-compose…

GGaurav Parmar@GauravTParmar · Jan 7

[1/4] Ever wondered what it would be like to use images—rather than text—to generate object and background compositions? We introduce VisualComposer, a method for compositional image generation with object-level visual prompts.

5.0K

Jun-Yan Zhu Retweeted

Gaurav Parmar@GauravTParmar · Jan 7

151

18.0K