Haocheng Xi

@HaochengXiUCB

First-year PhD in @berkeley_ai. Prev: Yao Class, @Tsinghua_Uni | Efficient Machine Learning & ML sys

Joined August 2024

1KFollowing

571Followers

Pinned

Haocheng Xi@HaochengXiUCB · Jul 14

🎉 Come check out our poster at #ICML2025! 🚀 Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity 📍 East Exhibition Hall A-B — #E-3307 🗓️ Poster Session 2 | Tue, Jul 15 | 🕓 4:30–7:00 PM ⚡ We speed up video diffusion transformers by over…

HaochengXiUCB's tweet image. 🎉 Come check out our poster at #ICML2025!

🚀 Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity

📍 East Exhibition Hall A-B — #E-3307
🗓️ Poster Session 2 | Tue, Jul 15 | 🕓 4:30–7:00 PM

⚡ We speed up video diffusion transformers by over…

9.0K

Haocheng Xi@HaochengXiUCB · Jul 26

Thank you @ericxtang and the @DaydreamLiveAI team for using StreamDiffusion in building such an exciting platform for creators and builders! Together with @cumulo_autumn and the rest of our team, we are continuously working to advance the open-source StreamDiffusion series for…

EEric the Daydreamer@ericxtang · Jul 26

18 months ago, @cumulo_autumn and @Chenfeng_X released StreamDiffusion, bringing real-time AI video to consumer GPUs. Today, it powers a vibrant community of creators and builders. Our latest open-source release brings major upgrade in quality and control. Demo here 👇

2.0K

Haocheng Xi@HaochengXiUCB · Jul 16

Empowered by SGLang, NVILA serving now has 4.4x throughput and 2.2x faster response 🚀🚀🚀 Awesome work made by @AndyZijianZhang w/ a lot help from SGLang team!

LLMSYS Org@lmsysorg · Jul 16

🚀Summer Fest Day 4: Turbocharging Vision-Language Models with SGLang + NVILA 4.4× throughput, 2.2× faster response time! We've integrated NVILA into SGLang, enabling high-performance, scalable serving of vision-language models. This unlocks a 4.4× TPS boost and significantly…

1.0K

Haocheng Xi Retweeted

NovaSky@NovaSkyAI · Jul 17

The SkyRL roadmap is live! Our focus is on building the easiest-to-use high-performance RL framework for agents. We'd love your ideas, feedback, or code to guide the project: github.com/NovaSky-AI/Sky…

3.0K

Haocheng Xi@HaochengXiUCB · Jul 17

QuantSpec appears at #ICML2025! Check our poster at E-2608, Poster Session 5, July 17🎉

RRishabh Tiwari@rish2k1 · Jul 17

🚨Come check out our poster at #ICML2025! QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache 📍 East Exhibition Hall A-B — #E-2608 🗓️ Poster Session 5 | Thu, Jul 17 | 🕓 11:00 AM –1:30 PM TLDR: Use a quantized version of the same model as its own draft…

446

Haocheng Xi Retweeted

Yukang Chen@yukangchen_ · Jul 11

Video understanding isn't just recognizing —it demands reasoning across thousands of frames. Meet Long-RL🚀 Highlights: 🧠 Dataset: LongVideo-Reason — 52K QAs with reasoning. ⚡ System: MR-SP - 2.1× faster RL for long videos. 📈 Scalability: Hour-long videos (3,600 frames) RL…

264

189

29.0K

Haocheng Xi Retweeted

NovaSky@NovaSkyAI · Jul 10

🔎 SkyRL + Search-R1 Training a multi-turn search agent doesn’t have to be complicated. With SkyRL, reproducing the SearchR1 recipe at high training throughput is quick and easy! We wrote up a detailed guide to show you how: novasky-ai.notion.site/skyrl-searchr1 1/N 🧵

146

126

16.0K

Haocheng Xi Retweeted

Wentao Guo@WentaoGuo7 · Jul 10

🦆🚀QuACK🦆🚀: new SOL mem-bound kernel library without a single line of CUDA C++ all straight in Python thanks to CuTe-DSL. On H100 with 3TB/s, it performs 33%-50% faster than highly optimized libraries like PyTorch's torch.compile and Liger. 🤯 With @tedzadouri and @tri_dao

322

197

73.0K

Haocheng Xi Retweeted

Zhuoyang Zhang@zhuoyang_zhang · Jul 9

🚀Check out #LPD - our latest work to accelerate autoregressive image generation. LPD stands for Locality-aware Parallel Decoding. ⚡️13× faster than traditional AR models and at least 3.4× faster than previous parallelized AR models. Github: github.com/mit-han-lab/lpd 🧵1/

7.0K

Haocheng Xi Retweeted

Muyang Li@lmxyy1999 · Jul 1

🚀 Meet #RadialAttention — a static sparse attention mechanism with O(nlogn) complexity for long video generation! ✅ Plug-and-play: works with pretrained models like #Wan, #HunyuanVideo, #Mochi ✅ Speeds up both training&inference by 2–4×, without quality loss 🧵1/4

134

23.0K

Haocheng Xi Retweeted

Xiuyu Li @ ICML@xiuyu_l · Jun 30

Sparsity can make your LoRA fine-tuning go brrr 💨 Announcing SparseLoRA (ICML 2025): up to 1.6-1.9x faster LLM fine-tuning (2.2x less FLOPs) via contextual sparsity, while maintaining performance on tasks like math, coding, chat, and ARC-AGI 🤯 🧵1/ z-lab.ai/projects/spars…

208

135

34.0K

Haocheng Xi Retweeted

机

机器之心 JIQIZHIXIN@jiqizhixin · Jun 26

Attention! A new and improved attention mechanism has just been proposed by MIT, NVIDIA, Princeton, and others. Radial Attention is a sparse, static attention mechanism with O(n log n) complexity. It focuses on nearby tokens and shrinks the attention window over time. It can…

140

917

755

94.0K

Haocheng Xi Retweeted

Yifei Zhou@YifeiZhou02 · Jun 4

📢 New Preprint: Self-Challenging Agent (SCA) 📢 It’s costly to scale agent tasks with reliable verifiers. In SCA, the key idea is to have another challenger to explore the env and construct tasks along with verifiers. Here is how it achieves 2x improvements on general…

258

143

27.0K

Haocheng Xi Retweeted

uccl_project@uccl_proj · Jun 12

1/N 📢 Introducing UCCL (Ultra & Unified CCL), an efficient collective communication library for ML training and inference, outperforming NCCL by up to 2.5x 🚀 Code: github.com/uccl-project/u… Blog: uccl-project.github.io/posts/about-uc… Results: AllReduce on 6 HGX across 2 racks over RoCE RDMA

7.0K

Haocheng Xi Retweeted

Mir Miroyan@mirmiroyan · Jun 6

We release Search Arena 🌐 — the first large-scale (24k+) dataset of in-the-wild user interactions with search-augmented LLMs. We also share a comprehensive report on user preferences and model performance in the search-enabled setting. Paper, dataset, and code in 🧵

232

181

42.0K

Haocheng Xi@HaochengXiUCB · Jun 4

🚀The code for Fast-dLLM is now open-source! 💥 Fast-dLLM achieves a 27.6× end-to-end speedup on 1024-token sequences with less than 2% accuracy drop. Check out the code here: github.com/NVlabs/Fast-dL…

EEnze Xie@xieenze_jr · May 29

🚀 Fast-dLLM: 27.6× Faster Diffusion LLMs with KV Cache & Parallel Decoding 💥 Key Features🌟 - Block-Wise KV Cache Reuses 90%+ attention activations via bidirectional caching (prefix/suffix), enabling 8.1×–27.6× throughput gains with <2% accuracy loss 🔄 -…

3.0K

Haocheng Xi@HaochengXiUCB · Jun 2

Excited to begin my summer internship at NVIDIA! This marks my second time joining the company—though it’s my first experience working at the Santa Clara headquarters. Previously I worked on low-precision training and co-authored two projects: COAT, accepted at ICLR 2025, and…

HaochengXiUCB's tweet image. Excited to begin my summer internship at NVIDIA! This marks my second time joining the company—though it’s my first experience working at the Santa Clara headquarters.

Previously I worked on low-precision training and co-authored two projects: COAT, accepted at ICLR 2025, and…

102

8.0K

Haocheng Xi Retweeted

NovaSky@NovaSkyAI · May 22

1/N Introducing SkyRL-SQL, a simple, data-efficient RL pipeline for Text-to-SQL that trains LLMs to interactively probe, refine, and verify SQL queries with a real database. 🚀 Early Result: trained on just ~600 samples, SkyRL-SQL-7B outperforms GPT-4o, o4-mini, and SFT model…

148

107

29.0K

Haocheng Xi@HaochengXiUCB · May 7

It's likely the first open RL LLM pipeline that supports real-world, long-horizon, agentic RL training, like software engineering. The optimizations and abstractions are effective and general. (Kudos to the team and happy to have contributed a small part while at Berkeley

RRichard Liaw@richliaw · May 7

Today we’re introducing SkyRL, a RL training pipeline optimized for long-horizon tasks like SWE-Bench, built on top of VeRL and OpenHands. SkyRL introduces an agent layer on top of VeRL: 1. Efficient multi-turn rollouts 2. Tool use 3. Scalable environment execution 1/N

229

104

19.0K