Gaurav Parmar

@GauravTParmar

PhD @ CMU

Joined June 2020

411Following

305Followers

Pinned

Gaurav Parmar@GauravTParmar · Jan 7

[1/4] Ever wondered what it would be like to use images—rather than text—to generate object and background compositions? We introduce VisualComposer, a method for compositional image generation with object-level visual prompts.

151

18.0K

Gaurav Parmar Retweeted

Mihir Prabhudesai@mihirp98 · Jul 22

🚨 The era of infinite internet data is ending, So we ask: 👉 What’s the right generative modelling objective when data—not compute—is the bottleneck? TL;DR: ▶️Compute-constrained? Train Autoregressive models ▶️Data-constrained? Train Diffusion models Get ready for 🤿 1/n

122

171

975

838

174.0K

Gaurav Parmar Retweeted

Kfir Aberman@AbermanKfir · Jun 27

🚀 Career Update After years pushing the boundaries of Generative AI at some of the world’s top companies -> I’m going startup. I’ve joined @DecartAI as a founding team member, leading the charge to build our San Francisco office from the ground up. decart.ai

162

18.0K

Gaurav Parmar Retweeted

Xun Huang@xunhuang1995 · Jun 9

Real-time video generation is finally real — without sacrificing quality. Introducing Self-Forcing, a new paradigm for training autoregressive diffusion models. The key to high quality? Simulate the inference process during training by unrolling transformers with KV caching.

126

779

583

135.0K

Gaurav Parmar@GauravTParmar · Apr 24

🚀 How to run 12B FLUX.1 on your local laptop with 2-3× speedup? Come check out our #SVDQuant (#ICLR2025 Spotlight) poster session! 🎉 🗓️ When: Friday, Apr 25, 10–12:30 (Singapore time) 📍 Where: Hall 3 + Hall 2B, Poster 169 📌 Poster: tinyurl.com/poster-svdquant 🎮 Demo:…

MMuyang Li@lmxyy1999 · Nov 8

🚀 The 4-bit era has arrived! Meet #SVDQuant, our new W4A4 quantization paradigm for diffusion models. Now, 12B FLUX can run on a 16GB 4090 laptop without offloading—with 3x speedups over W4A16 models (like NF4) while maintaining top-tier image quality. #AI #Quantization. 1/7

9.0K

Gaurav Parmar@GauravTParmar · Jan 14

This is really cool

DDavid McAllister@davidrmcall · Jan 14

Decentralized Diffusion Models power stronger models trained on more accessible infrastructure. DDMs mitigate the networking bottleneck that locks training into expensive and power-hungry centralized clusters. They scale gracefully to billions of parameters and generate…

624

Gaurav Parmar@GauravTParmar · Jan 7

Text prompts have shaped how we compose images with foundation models. But what if we could simply inject Visual Prompts instead? We introduce 🌟Visual Composer🌟 which achieves high-fidelity compositions of subjects and backgrounds with visual prompts! snap-research.github.io/visual-compose…

GGaurav Parmar@GauravTParmar · Jan 7

5.0K

Gaurav Parmar@GauravTParmar · Jan 7

One of the motivating application of this project was to emulate a "photo album" experience. With VisualComposer, you can create image variations from one image. But it also became a more general tool where you not only can generate image variations, but also compose any visual…

GGaurav Parmar@GauravTParmar · Jan 7

2.0K

Gaurav Parmar Retweeted

Homanga Bharadhwaj@mangahomanga · Dec 19

HandsOnVLM: An in-context action prediction assistant for daily activities. It enables predicting future interaction trajectories of human hands in a scene given natural language queries. Evaluations across 100s of diverse scenarios in homes, offices, and outdoors! 1/n

180

21.0K

Gaurav Parmar Retweeted

Shivam Duggal@ShivamDuggal4 · Nov 7

Current vision systems use fixed-length representations for all images. In contrast, human intelligence or LLMs (eg: OpenAI o1) adjust compute budgets based on the input. Since different images demand diff. processing & memory, how can we enable vision systems to be adaptive ? 🧵

484

336

91.0K

Gaurav Parmar@GauravTParmar · Jul 9, 2024

As a founding researcher, I have seen @SkildAI grow exponentially. We changed 3 offices, grew 10x in human (and robot) numbers, and become a unicorn in less than a year. If you want to scale up robotics and work with a cracked team of engineers and scientists, come to @SkildAI.

DDeepak Pathak@pathak2206 · Jul 9, 2024

Thrilled to announce @SkildAI! Over the past year, @gupta_abhinav_ and I have been working with our top-tier team to build an AI foundation model grounded in the physical world. Today, we’re taking Skild AI out of stealth with $300M in Series A funding: forbes.com/sites/rashishr…

197

39.0K

Gaurav Parmar Retweeted

Amil Dravid@_AmilDravid · Jun 14, 2024

The latent space of earlier generative models like GANS can linearly encode concepts of the data. What if the data was model weights? We present weights2weights, a subspace in diffusion weights that behaves as an interpretable latent space over customized diffusion models.

186

104

94.0K

Gaurav Parmar Retweeted

dinesh reddy@dineshredy · Apr 4, 2024

WALT3D has accepted as Oral at #cvpr (top 90 out of 12000) WALT3D:Generating Realistic Training Data from Time-Lapse Imagery for Reconstructing Dynamic Objects under Occlusion Project Page: cs.cmu.edu/~walt3d Key Idea: Convert you image to 3D under severe Occlusions

3.0K

Gaurav Parmar@GauravTParmar · Mar 22, 2024

Our new inversion method facilitates interactive image editing with few-step diffusion models 🏃‍♀️🏃 I played with it all morning, so much fun -- less than 2 sec per edit 😲 Try the demo! Project page: garibida.github.io/ReNoise-Invers… Cool demo: huggingface.co/spaces/garibid…

DDaniel Garibi@DanielGaribi · Mar 22, 2024

Introducing ReNoise Inversion! With the recent diffusion models trained to generate images with a few steps, interactive image editing is within our reach. Our method unlocks interactive image editing by inverting images to the noise space of fast diffusion models 🚀

5.0K

Gaurav Parmar Retweeted

Radamés Ajna@radamar · Mar 20, 2024

Testing new pix2pix-Turbo in real-time, very interesting GAN architecture that leverages SD-Turbo model. Here I'm using edge2image LoRA single-step inference 🤯

202

120

27.0K

Gaurav Parmar Retweeted

Jun-Yan Zhu@junyanz89 · Mar 19, 2024

[1/2] We’ve released the code for #pix2pixturbo and #CycleGANTurbo. These conditional GANs are able to adapt a text-to-image model such as SD-Turbo for both paired and unpaired image translation with a single step (0.11 sec on A100 and 0.29 sec on A6000). Try our code and the…

230

36.0K

Gaurav Parmar Retweeted

AK@_akhaliq · Mar 19, 2024

One-Step Image Translation with Text-to-Image Models In this work, we address two limitations of existing conditional diffusion models: their slow inference speed due to the iterative denoising process and their reliance on paired data for model fine-tuning.

117

514

272

116.0K