Perry Zhang

@PY_Z001

PhD student at UCSD CSE. Working on video generation architecture.

San Diego, US

Joined November 2018

377Following

932Followers

Pinned

Perry Zhang Retweeted

Jiaxin Shi@thjashin · Mar 6

Don’t be distracted by RL and focus on developing better gradient estimator for E_p [f(X)]?

3.0K

Perry Zhang@PY_Z001 · Jul 25

I learned a lot from NATTEN!

AAli Hassani@AliHassaniJr · Jul 24

Watch my talk about NATTEN on @GPU_MODE this Saturday at 3PM ET / noon PT. I'll go over all the exciting new features we shipped very recently, especially our Hopper and Blackwell FNA kernels, now speeding up video / world models by up to 2.6X e2e! youtube.com/watch?v=mF_H_J

243

Perry Zhang Retweeted

Ali Hassani@AliHassaniJr · Jul 24

2.0K

Perry Zhang Retweeted

Hao AI Lab@haoailab · Jul 14

📣 We’ve had three papers accepted at #ICML2025, Hao-AI-Lab is sending @haozhangml to attend ICML in person😂! If you're around, please find Hao at the venue and chat with him about video diffusion, LLM agents, and efficient attention 👋🧠 🎬 Fast Video Generation with Sliding…

1.0K

Perry Zhang Retweeted

Hao Zhang@haozhangml · Jul 13

Heading to ICML next week (Monday - Thursday). Down to chat research, ideas, anything cool, or just hang 😄📍🎯

1.0K

Perry Zhang@PY_Z001 · Jul 12

🚀 Attention is the bottleneck in video DiTs—5 s of 720p = 100K+ tokens, quadratic cost blows up fast. Sparse/linear attention is 🔑 for long-context world models. 🧠 Track relavent papers in our awsome-video-attention repo → github.com/hao-ai-lab/Aws… #WorldModel #VideoAI

PY_Z001's tweet card. A curated list of recent papers on efficient video attention for video diffusion models, including sparsification, quantization, and caching, etc. - hao-ai-lab/Awesome-Video-Attention

3.0K

Perry Zhang Retweeted

Hao AI Lab@haoailab · Jun 3

🔧🤖 New wave of open-source LLMs like Deekseek-R1-0528 and Qwen3-235B-A22B are leveling up with stronger agentic performance. We test them in head-to-head gameplay — the upgraded Deekseek-R1-0528 outsmarts strong reasoning models like o4-mini across several games and it nearly…

284

147

36.0K

Perry Zhang@PY_Z001 · May 30

I will be giving a talk in @GPU_MODE tomorrow (May 31 12pm PST) about FastVideo/STA/VSA. Come if you're interested! youtube.com/watch?v=x44iGp…

PY_Z001's tweet image. I will be giving a talk in @GPU_MODE tomorrow (May 31 12pm PST) about FastVideo/STA/VSA.
Come if you're interested!

youtube.com/watch?v=x44iGp…

111

6.0K

Perry Zhang@PY_Z001 · May 27

amazing！

BBenjamin F Spector@bfspector · May 27

(1/5) We’ve never enjoyed watching people chop Llamas into tiny pieces. So, we’re excited to be releasing our Low-Latency-Llama Megakernel! We run the whole forward pass in single kernel. Megakernels are faster & more humane. Here’s how to treat your Llamas ethically: (Joint…

393

Perry Zhang Retweeted

Hao AI Lab@haoailab · May 12

Announcing FastVideo V1, a unified framework for accelerating video generation. FastVideo V1 offers: - A simple, consistent Python API - State of the art model performance optimizations - Optimized implementations of popular models Blog: hao-ai-lab.github.io/blogs/fastvide…

164

14.0K

Perry Zhang@PY_Z001 · May 2

STA is accepted by ICML 2025!!

HHao AI Lab@haoailab · Feb 18

🎥 Videos DiTs are painfully slow, HunyuanVideo takes 16 min to generate a 5s 720P video on H100. 🤯 Announcing Sliding Tile Attention (STA): * Accelerate 3D full attention (FA3) by up to 10x * Slash the end-to-end time from 16 --> 5 mins * NO extra training. NO quality loss!…

3.0K

Perry Zhang Retweeted

Hao AI Lab@haoailab · Apr 23

Thrilled to share recent research from our fascinating lab members and collaborators at #ICLR2025! 🚀✨ Come say hi in our poster sessions and dive into discussions on LLM agents, reasoning, long-context training, efficient inference, and more. We’re excited to share, learn and…

2.0K

Perry Zhang@PY_Z001 · Apr 21

Let me tell a real story of my own with @nvidia. Back in 2014, I was a wide-eyed first-year PhD student at CMU in @ericxing's lab, trying to train AlexNet on CPU (don’t ask why). I had zero access to GPUs. NVIDIA wasn’t yet "THE NVIDIA" we know today—no DGXs, no trillion-dollar…

HHao AI Lab@haoailab · Apr 21

We are beyond honored and thrilled to welcome the amazing new @nvidia DGX B200 💚 at @HDSIUCSD @haoailab. This generous gift from @nvidia is an incredible recognition and an opportunity for the UCSD MLSys community and @haoailab to push the boundaries of AI + System research. 💪

638

96.0K

Perry Zhang@PY_Z001 · Apr 21

😍😍😍😍😍

HHao AI Lab@haoailab · Apr 21

491

Perry Zhang Retweeted

Hao AI Lab@haoailab · Apr 15

When Ilya Sutskever once explained why next-word prediction leads to intelligence, he made a metaphor: if you can piece together the clues and deduce the criminal’s name on the last page, you have a real understanding of the story. 🕵️‍♂️ Inspired by that idea, we turned to Ace…

272

2.0K

1.0K

954.0K

Perry Zhang Retweeted

Pierre Bongrand@bongrandp · Mar 31

sorry, but you don’t get to insult people at @AIatMeta after being the most closed source AI company in the world for the last two years. meta is literally THE company with @MistralAI that stayed true to their open source commitment

4.0K

132

202.0K

Perry Zhang@PY_Z001 · Mar 18

#GTC25 featured our OSDI'24 work on disaggregated inference! 🚀🚀🚀 Who would have known that disaggregated prefill will become such a core technology in the next generation LLM inference oin year ago! x.com/haoailab/statu…

HHao AI Lab@haoailab · Mar 18, 2024

Still optimizing throughput for LLM Serving? Think again: Goodput might be a better choice! Splitting prefill from decode to different GPUs yields - up to 4.48x goodput - up to 10.2x stricter latency criteria Blog: hao-ai-lab.github.io/blogs/distserv… Paper: arxiv.org/abs/2401.09670

3.0K

Perry Zhang@PY_Z001 · Mar 18

Beyond thrilled 🚀 to see my lab's work DistServe (OSDI'24) just got featured in Jensen Huang's keynote at Nvidia GTC! This marks our third major breakthrough in LLM inference after PagedAttention (vLLM) and Lookahead Decoding — pushing the frontier yet again! Since we post the…

vvLLM@vllm_project · Mar 18

Spotted @vllm_project during Jensen's Keynote @nvidia #GTC

148

30.0K

Perry Zhang@PY_Z001 · Mar 3

Thank you @_akhaliq for sharing! Introducing our newest CVPR 2025 paper EgoLife: Towards Egocentric Life Assistant Homepage: egolife-ai.github.io Blog: egolife-ai.github.io/blog/ Code: github.com/EvolvingLMMs-L… EgoLife is a project focused on building AI-powered egocentric life…

AAK@_akhaliq · Mar 2

EgoLife Towards Egocentric Life Assistant introduce EgoLife, a project to develop an egocentric life assistant that accompanies and enhances personal efficiency through AI-powered wearable glasses

35.0K

Perry Zhang Retweeted

Hao AI Lab@haoailab · Feb 28

Claude-3.7 was tested on Pokémon Red, but what about more real-time games like Super Mario 🍄🌟? We threw AI gaming agents into LIVE Super Mario games and found Claude-3.7 outperformed other models with simple heuristics. 🤯 Claude-3.5 is also strong, but less capable of…

219

1.0K

522

233.0K