Xiuyu Li @ ICML

@xiuyu_l

Efficiently scaling agents. CS PhD student @berkeley_ai. Prev @NVIDIA @AIatMeta @Cornell.

Bay Area

Joined August 2017

620Following

2KFollowers

Pinned

Scale smarter, not harder! Long CoT reasoning is powerful, but its sequential nature limits how efficiently and easily it can scale We incentivize LMs to divide and conquer subtasks in parallel, selectively gathering only the highest-quality explorations

JJiayi Pan@jiayi_pirate · Apr 23

We explore a new dimension in scaling reasoning models in Adaptive Parallel Reasoning APR lets LMs learn to orchestrate both serial & parallel compute E2E via supervised training + RL — w/ better efficiency and scalability than long CoT on Countdown 🧵 arxiv.org/abs/2504.15466

11.0K

Xiuyu Li @ ICML Retweeted

Shreya Shekhar@_shreya_s · Jul 24

Excited to be partnering with @HenryYin_ and @naomiiixia from @agihouse_org to host a deep dive session on some of the most topical recent research in RL. We’ll have amazing researchers @jiayi_pirate talking about his recent work on Adaptive Parallel Reasoning, and…

8.0K

Xiuyu Li @ ICML@xiuyu_l · Jul 17

Excited to be presenting SparseLoRA with @xiuyu_l this Thursday at #ICML! Catch us from 11 AM to 1:30 PM at East Exhibition Hall A-B, Poster #E-3004. Come by to chat about sparsity, fine-tuning, and more! 👋

SSamir Khaki@samir_khaki · Jun 30

PEFT methods like LoRA and QLoRA brought major memory savings to fine-tuning. SparseLoRA introduces on-the-fly contextual sparsity, making training both parameter and compute-efficient 🚀 Less compute, same performance – try it out: z-lab.ai/projects/spars…

634

Xiuyu Li @ ICML Retweeted

Yukang Chen@yukangchen_ · Jul 11

Video understanding isn't just recognizing —it demands reasoning across thousands of frames. Meet Long-RL🚀 Highlights: 🧠 Dataset: LongVideo-Reason — 52K QAs with reasoning. ⚡ System: MR-SP - 2.1× faster RL for long videos. 📈 Scalability: Hour-long videos (3,600 frames) RL…

264

189

29.0K

Xiuyu Li @ ICML Retweeted

Muyang Li@lmxyy1999 · Jul 1

🚀 Meet #RadialAttention — a static sparse attention mechanism with O(nlogn) complexity for long video generation! ✅ Plug-and-play: works with pretrained models like #Wan, #HunyuanVideo, #Mochi ✅ Speeds up both training&inference by 2–4×, without quality loss 🧵1/4

134

23.0K

Xiuyu Li @ ICML@xiuyu_l · Jul 1

While top-k SVD on LLM weights isn't always accurate enough as a drop-in replacement, it excels in capturing key structures and absorbing outliers (e.g. #SVDQuant). Using it as an auxiliary signal or prior seems underexplored, and many cool methods could be built on it

CChenfeng_X@Chenfeng_X · Jun 30

SVD is playing an increasingly important role in LLM. It has been widely used for model compression (e.g., our previous Dobi-SVD) and memory-efficient training (e.g., GaLore). Samir, Xiuyu, and Junxian take a creative step further by leveraging SVD to dynamically select a sparse…

747

Xiuyu Li @ ICML@xiuyu_l · Jun 30

XXiuyu Li @ ICML@xiuyu_l · Jun 30

Sparsity can make your LoRA fine-tuning go brrr 💨 Announcing SparseLoRA (ICML 2025): up to 1.6-1.9x faster LLM fine-tuning (2.2x less FLOPs) via contextual sparsity, while maintaining performance on tasks like math, coding, chat, and ARC-AGI 🤯 🧵1/ z-lab.ai/projects/spars…

2.0K