Xiuyu Li @ ICML
@xiuyu_l
Efficiently scaling agents. CS PhD student @berkeley_ai. Prev @NVIDIA @AIatMeta @Cornell.
Scale smarter, not harder! Long CoT reasoning is powerful, but its sequential nature limits how efficiently and easily it can scale We incentivize LMs to divide and conquer subtasks in parallel, selectively gathering only the highest-quality explorations
We explore a new dimension in scaling reasoning models in Adaptive Parallel Reasoning APR lets LMs learn to orchestrate both serial & parallel compute E2E via supervised training + RL — w/ better efficiency and scalability than long CoT on Countdown 🧵 arxiv.org/abs/2504.15466
Excited to be partnering with @HenryYin_ and @naomiiixia from @agihouse_org to host a deep dive session on some of the most topical recent research in RL. We’ll have amazing researchers @jiayi_pirate talking about his recent work on Adaptive Parallel Reasoning, and…
Excited to be presenting SparseLoRA with @xiuyu_l this Thursday at #ICML! Catch us from 11 AM to 1:30 PM at East Exhibition Hall A-B, Poster #E-3004. Come by to chat about sparsity, fine-tuning, and more! 👋
PEFT methods like LoRA and QLoRA brought major memory savings to fine-tuning. SparseLoRA introduces on-the-fly contextual sparsity, making training both parameter and compute-efficient 🚀 Less compute, same performance – try it out: z-lab.ai/projects/spars…
Video understanding isn't just recognizing —it demands reasoning across thousands of frames. Meet Long-RL🚀 Highlights: 🧠 Dataset: LongVideo-Reason — 52K QAs with reasoning. ⚡ System: MR-SP - 2.1× faster RL for long videos. 📈 Scalability: Hour-long videos (3,600 frames) RL…
🚀 Meet #RadialAttention — a static sparse attention mechanism with O(nlogn) complexity for long video generation! ✅ Plug-and-play: works with pretrained models like #Wan, #HunyuanVideo, #Mochi ✅ Speeds up both training&inference by 2–4×, without quality loss 🧵1/4
While top-k SVD on LLM weights isn't always accurate enough as a drop-in replacement, it excels in capturing key structures and absorbing outliers (e.g. #SVDQuant). Using it as an auxiliary signal or prior seems underexplored, and many cool methods could be built on it
SVD is playing an increasingly important role in LLM. It has been widely used for model compression (e.g., our previous Dobi-SVD) and memory-efficient training (e.g., GaLore). Samir, Xiuyu, and Junxian take a creative step further by leveraging SVD to dynamically select a sparse…
PEFT methods like LoRA and QLoRA brought major memory savings to fine-tuning. SparseLoRA introduces on-the-fly contextual sparsity, making training both parameter and compute-efficient 🚀 Less compute, same performance – try it out: z-lab.ai/projects/spars…
Sparsity can make your LoRA fine-tuning go brrr 💨 Announcing SparseLoRA (ICML 2025): up to 1.6-1.9x faster LLM fine-tuning (2.2x less FLOPs) via contextual sparsity, while maintaining performance on tasks like math, coding, chat, and ARC-AGI 🤯 🧵1/ z-lab.ai/projects/spars…