Jun-Yan Zhu
@junyanz89
Assistant Professor at Generative Intelligence Lab @CMU_Robotics @CarnegieMellon. Understanding and creating pixels.
We've released the code for LegoGPT. This autoregressive model generates physically stable and buildable designs from text prompts, by integrating physics laws and assembly constraints into LLM training and inference. This work is led by PhD students @AvaLovelace0, @kangle_deng,…
🚀 #Nunchaku now supports FLUX.1-Kontext-dev! Edit images with just one sentence — style transfer, face swap, and more — now 2–3× faster and using 1/4 VRAM. ✅ Works with ComfyUI & Diffusers 🔗 Demo: svdquant.mit.edu/kontext/ 📂 Code: github.com/mit-han-lab/nu… 🤗 4-bit #SVDQuant…
Wrapped up Stanford CS336 (Language Models from Scratch), taught with an amazing team @tatsu_hashimoto @marcelroed @neilbband @rckpudi. Researchers are becoming detached from the technical details of how LMs work. In CS336, we try to fix that by having students build everything:
Thank you, Coach Pop, for your brilliance on and off the court. We look forward to our next chapter together.
🚀 How to run 12B FLUX.1 on your local laptop with 2-3× speedup? Come check out our #SVDQuant (#ICLR2025 Spotlight) poster session! 🎉 🗓️ When: Friday, Apr 25, 10–12:30 (Singapore time) 📍 Where: Hall 3 + Hall 2B, Poster 169 📌 Poster: tinyurl.com/poster-svdquant 🎮 Demo:…
🚀 The 4-bit era has arrived! Meet #SVDQuant, our new W4A4 quantization paradigm for diffusion models. Now, 12B FLUX can run on a 16GB 4090 laptop without offloading—with 3x speedups over W4A16 models (like NF4) while maintaining top-tier image quality. #AI #Quantization. 1/7
Hi there, @phillip_isola and I wrote a short article (500 words) on Generative Modeling for the Open Encyclopedia of Cognitive Science. We briefly discuss the basic concepts of generative models and their applications. Don't miss out @phillip_isola's hand-drawn cats in Figure 1!
Generative Modeling by Jun-Yan Zhu: doi.org/10.21428/e2759…
I've updated my blog post to walk through the remaining technical details of our Surface Winding Numbers algorithm: now the calculus of the algorithm is explained a bit more in detail. The post, paper, code, etc. is all here: nzfeng.github.io/research/WNoDS…
My SIGGRAPH 2023 presentation of "Winding Numbers on Discrete Surfaces", authored with @MarkGillespie64 and @keenanisalive , is now on YouTube: youtu.be/QnMx3s4_4WY
Today's visual generative models are mere stochastic parrots of imagery, much like early language models, which could only statistically mimic short sentences with little reasoning. In contrast, modern large language models (LLMs) can comprehend long documents, keep track of…
Halfmoon is Reve Image — and it’s the best image model in the world 🥇 (🔊)
Excited to come out of stealth at @reveimage! Today's text-to-image/video models, in contrast to LLMs, lack logic. Images seem plausible initially but fall apart under scrutiny: painting techniques don't match, props don't carry meaning, and compositions lack intention. (1/4)
The Halfmoon 🌓 reveal: Congratulations to @reveimage on creating the world’s leading image generation model with Reve Image! Reve Image has been in the Artificial Analysis Image Arena over the past week and is the clear leader, beating strong competition including Recraft V3,…
We shared some early work towards a multi-modal and multi-task 3D foundation model at Roblox. First release is a discrete shape tokenizer compatible with autoregressive modeling for text-to-shape. More to come soon Github: github.com/Roblox/cube Arxiv: arxiv.org/abs/2503.15475
Check out our @gradio demo based on @bfl_ml's FLUX model!! We fine-tune the model using our generated dataset to achieve tuning-free customization on new reference objects. huggingface.co/spaces/nupurkm…
Can we generate a training dataset of the same object in different contexts for customization? Check out our work SynCD, which uses Objaverse assets and shared attention in text-to-image models for the same. cs.cmu.edu/~syncd-project/ w/ @xi_yin_ @junyanz89 @imisra_ @smnh_azadi
Holy Crap!! The journal extension of Expressive Image Generation with Rich Text has been accepted to IJCV! This extension expands the capability of rich text by enabling hyperlinks, texture fill, semantic image editing, and a new benchmark (yay, table with numbers)! Congrats…
Can we generate a training dataset of the same object in different contexts for customization? Check out our work SynCD, which uses Objaverse assets and shared attention in text-to-image models for the same. cs.cmu.edu/~syncd-project/ w/ @xi_yin_ @junyanz89 @imisra_ @smnh_azadi
Explore SVDQuant, it's time for 4bit inference: forbes.com/sites/johnwern…
Excited to bring back the 2nd Workshop on Visual Concepts at @CVPR 2025, this time with a call for papers! We welcome submissions on the following topics. See our website for more info: sites.google.com/stanford.edu/w… Join us & a fantastic lineup of speakers in Tennessee!
🚀In my last project, I developed a simple interactive WebUI tool, #VisCompare, to compare images/videos side-by-side across different models and methods as in the video. 🌟It's now open-source at github.com/mit-han-lab/Vi…! 🙌Hope it can benefit the community—feedback and…
Introducing ⚗️ Video Alchemist Our new video model supporting 👪 Multi-subject open-set personalization 🏞️ Foreground & background personalization 🚀 Without the need of inference-time tuning snap-research.github.io/open-set-video… [Results] 1. Sora girl rides a dinosaur on a savanna 🧵👇
Text prompts have shaped how we compose images with foundation models. But what if we could simply inject Visual Prompts instead? We introduce 🌟Visual Composer🌟 which achieves high-fidelity compositions of subjects and backgrounds with visual prompts! snap-research.github.io/visual-compose…
[1/4] Ever wondered what it would be like to use images—rather than text—to generate object and background compositions? We introduce VisualComposer, a method for compositional image generation with object-level visual prompts.
[1/4] Ever wondered what it would be like to use images—rather than text—to generate object and background compositions? We introduce VisualComposer, a method for compositional image generation with object-level visual prompts.