Weiyang Liu
@Besteuler
Curious AI researcher @CUHKofficial. Postdoc @MPI_IS. PhD @Cambridge_Uni & @GeorgiaTech. Previous Intern @Google & @nvidia. All opinions are my own.
Proud to introduce Group Sequence Policy Optimization (GSPO), our stable, efficient, and performant RL algorithm that powers the large-scale RL training of the latest Qwen3 models (Instruct, Coder, Thinking) 🚀 📄 huggingface.co/papers/2507.18…
Wild paper They prove (!!) a transformer block (Attn + MLP) running on prompt Outputs the same logits with no prompt If MLP weights updated by vector: W′ = W + ΔW Calc from attn latent: ΔW = (W·Δa) × (A(x)ᵀ / ‖A(x)‖²) Given prompt: Δa = A(C, x) − A(x) Fucking fine tuning.
MIT's Advanced Data Structures by Prof. Erik Demaine Lecture notes: courses.csail.mit.edu/6.897/spring03…
🚀 Excited to share that the Workshop on Mathematical Reasoning and AI (MATH‑AI) will be at NeurIPS 2025! 📅 Dec 6 or 7 (TBD), 2025 🌴 San Diego, California
Kimi K2 paper dropped! describes: - MuonClip optimizer - large-scale agentic data synthesis pipeline that systematically generates tool-use demonstrations via simulated and real-world environments - an RL framework that combines RLVR with a self- critique rubric reward mechanism…
Great excuse to share something I really love: 1-Lipschitz nets. They give clean theory, certs for robustness, the right loss for W-GANs, even nicer grads for explainability!! Yet are still niche. Here’s a speed-run through some of my favorite papers on the field. 🧵👇
optimization theorem: "assume a lipschitz constant L..." the lipschitz constant:
Speaking as a past IMO contestant, this is impressive but misleading - gold vs silver is meaningless, 1 pt below gold vs borderline gold is noise The impressive bit is using a general reasoning model, not a specialised system, and no verified reward. Peak AI maths is unchanged
1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly…
🚀 Hello, Kimi K2! Open-Source Agentic Model! 🔹 1T total / 32B active MoE model 🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models 🔹Strong in coding and agentic tasks 🐤 Multimodal & thought-mode not supported for now With Kimi K2, advanced agentic intelligence…
While working on the improved version of Orthogonal Finetuning (OFT) (spherelab.ai/oftv2), we also found that OFT represents a more general class of finetuning method -- sequential adaptation. This poses an interesting comparison to LoRA, which represents parallel…

One thing my team has discovered is the consistent effectiveness of Quantized OFT. QOFT works significantly better and more stable than QLoRA with not only better adaptation performance/stability, but also finetuning time and GPU memory.
📔What really makes OFTv2 shine is its great combatibility with quantized models. Here comes QOFT. Without bells and whistles, QOFT ourperforms QLoRA significantly in adaptation performance, GPU memory usage, and runtime. QOFT is simply better. 🧵3/6
For a quick start with OFTv2 and QOFT, check out our Colab tutorial: drive.google.com/drive/folders/… We give examples on finetuning standard/quantized LLMs (Qwen) and Stable Diffusion 3.5. Kudos to @ZejuQiu36055 for preparing the notebook!
🚀 Meet OFTv2 — Orthogonal Finetuning made scalable, finally. ⚡️ 10× faster 💾 3× less GPU memory 🤖 Quantized OFT: plug-and-play on quantized LLMs, better than QLoRA Try it now on Hugging face PEFT: tinyurl.com/ycxswfe7 Website: spherelab.ai/oftv2/ #AI #LLM 🧵1/6