Ziyang Chen

@CzyangChen

Research Scientist @LumaLabsAI. Ph.D. @UMich multimodal learning, audio-visual learning prev @Adobe and @AIatMeta

Joined June 2021

425Following

383Followers

Pinned

Ziyang Chen@CzyangChen · Nov 27

🎥 Introducing MultiFoley, a video-aware audio generation method with multimodal controls! 🔊 We can ⌨️Make a typewriter sound like a piano 🎹 🐱Make a cat meow like a lion roars! 🦁 ⏱️Perfectly time existing SFX 💥 to a video

213

134

41.0K

Ziyang Chen@CzyangChen · Jun 14

Come to join us on poster #285 this afternoon #CVPR2025!

ZZiyang Chen@CzyangChen · Nov 27

379

Ziyang Chen@CzyangChen · Jun 13

Come to visit our poster!

CChao Feng@chaof1234 · Jun 13

Sharing our #CVPR2025 paper: "GPS as a Control Signal for Image Generation"! 🛰️+✍️ We turn the GPS tag stored in EXIF of photos into a control signal for diffusion models—so they don’t just know what you asked for, but where you want it to look like. Come to see our poster at…

313

Ziyang Chen@CzyangChen · Jun 6

Come and join us at CVPR this year!

LLuma AI@LumaLabsAI · Jun 6

Heading to @CVPR in Nashville next week? Join us for an open-bar happy hour at Barstool Nashville. Meet the Luma AI team and connect with fellow innovators in AI. Space is limited, RSVP now. lu.ma/5s0o2hlh

413

Ziyang Chen Retweeted

Luma AI@LumaLabsAI · Jun 4

Introducing Modify Video. Reimagine any video. Shoot it in post with director-grade control over style, character, and setting. Restyle expressive performances, swap entire worlds, or redesign the frame to your vision. Shoot once. Shape infinitely.

184

635

8.0K

3.0K

3.5M

Ziyang Chen Retweeted

Tiange Luo@tiangeluo · May 1

Will VLMs adhere strictly to their learned priors, unable to perform visual reasoning on content never existed on the Internet? We propose ViLP, a benchmark designed to probe the visual-language priors of VLMs by constructing Question-Image-Answer triplets that deliberately…

115.0K

Ziyang Chen Retweeted

Sarah Jabbour@SarahJabbour_ · Jan 15

I’m on the PhD internship market for Spr/Summer 2025! I have experience in multimodal AI (EHR, X-ray, text), explainability for image models w/ genAI, clinician-AI interaction (surveyed 700+ doctors), and tabular foundation models. Please reach out if you think there’s a fit!

6.0K

Ziyang Chen Retweeted

Linyi Jin@jin_linyi · Dec 13

Introducing 👀Stereo4D👀 A method for mining 4D from internet stereo videos. It enables large-scale, high-quality, dynamic, *metric* 3D reconstructions, with camera poses and long-term 3D motion trajectories. We used Stereo4D to make a dataset of over 100k real-world 4D scenes.

105

531

282

89.0K

Ziyang Chen@CzyangChen · Dec 13

I'll be presenting "Images that Sound" today at #NeurIPS2024! East Exhibit Hall A-C #2710. Come say hi to me and @andrewhowens :) (@CzyangChen sadly could not make it, but will be there in spirit :') )

ZZiyang Chen@CzyangChen · May 21, 2024

These spectrograms look like images, but can also be played as a sound! We call these images that sound. How do we make them? Look and listen below to find out, and to see more examples!

6.0K

Ziyang Chen Retweeted

hugo flores garcía 🌻@hugggof · Dec 12

new paper! 🗣️Sketch2Sound💥 Sketch2Sound can create sounds from sonic imitations (i.e., a vocal imitation or a reference sound) via interpretable, time-varying control signals. paper: arxiv.org/abs/2412.08550 web: hugofloresgarcia.art/sketch2sound

14.0K

Ziyang Chen@CzyangChen · Dec 8

Check out the awesome work from @TianweiY!

TTianwei Yin@TianweiY · Dec 7

Video diffusion models generate high-quality videos but are too slow for interactive applications. We @MIT_CSAIL @AdobeResearch introduce CausVid, a fast autoregressive video diffusion model that starts playing the moment you hit "Generate"! A thread 🧵

237

Ziyang Chen Retweeted

Daniel Geng@dangengdg · Dec 4

What happens when you train a video generation model to be conditioned on motion? Turns out you can perform "motion prompting," just like you might prompt an LLM! Doing so enables many different capabilities. Here’s a few examples – check out this thread 🧵 for more results!

147

671

334

93.0K

Ziyang Chen Retweeted

Ayush Shrivastava@ayshrv · Oct 1

We present Global Matching Random Walks, a simple self-supervised approach to the Tracking Any Point (TAP) problem, accepted to #ECCV2024. We train a global matching transformer to find cycle consistent tracks through video via contrastive random walks (CRW).

17.0K

Ziyang Chen Retweeted

Sarah Jabbour@SarahJabbour_ · Jul 22, 2024

📢Presenting 𝐃𝐄𝐏𝐈𝐂𝐓: Diffusion-Enabled Permutation Importance for Image Classification Tasks #ECCV2024 We use permutation importance to compute dataset-level explanations for image classifiers using diffusion models (without access to model parameters or training data!)

4.0K

Ziyang Chen@CzyangChen · Jul 11, 2024

This Saturday, be sure to check out @CzyangChen with our Geo Regional Asia Group! Learn more: cohere.com/events/cohere-…

CCohere Labs@Cohere_Labs · Jul 5, 2024

Mark you calendars! July 13th, join Ziyang Chen for a presentation on "Images that Sound: Composing Images and Sounds on a Single Canvas." 👨‍💻 Learn more: cohere.com/events/cohere-…

2.0K