Ziniu Li
@ZiniuLi
Ph.D. student @ CUHK, Shenzhen. Intern @Bytedance-Seed Working on RL and LLMs. Prev: Intern @Tencent-AI Lab
🚀RL algorithms are shaping the post-training of LLMs, but how do their objectives connect? In this blog, I explore their relationships and provide a unified perspective through the Policy Gradient Theorem—the backbone of policy gradient methods. Dive in: lancelqf.github.io/note/llm_post_…
Amazing work by @RidgerZhu ,more resources to investigate the mechanism behind the hybrid linear attention. Resources: Paper: arxiv.org/pdf/2507.06457 Huggingface CKPT Link: huggingface.co/collections/m-…
Hybrid architectures mix linear & full attention in LLMs. But which linear attention is best? This choice has been mostly guesswork. In our new work, we stop guessing. We trained, open-sourced 72 MODELS (340M & 1.3B) to dissect what truly makes a hybrid model tick🧶
🚀 Thrilled to announce that our paper "SCRIT: Self-Evolving LLM Critique without Human or Stronger Models" was accepted to #COLM2025! We enable LLMs to self-improve critique abilities — zero human annotations, zero stronger models needed! 🔄✨ Looking forward to meeting…
🚀 Critique abilities are key for scaling LLMs, but current open-source models fall short. We introduce SCRIT: a framework with scalable oversight that enables LLMs to self-improve their critique skills✨ We’ve built a pipeline to generate high-quality synthetic critique data…
We’re excited to share our new paper “CoRT: Code-integrated Reasoning within Thinking”! 🤖 A post-training framework that teaches Large Reasoning Models (LRMs) to better leverage Code Interpreters for enhanced mathematical reasoning. 🔍 Key Highlights: Strategic hint…
D4RL is (almost) solved—many tasks now >95. Real systems still crash because benches are too clean. Meet NeoRL-2 🚀: 7 envs with stochastic delays, exogenous disturbances, hard safety constraints & data-scarce logs. Challenge accepted? 📄 arxiv.org/abs/2503.19267
Thrilled to share our paper "ORLM: A Customizable Framework in Training Large Models for Automated Optimization Modeling" has been accepted by Operations Research! 🎉 This is the FIRST LLM paper in the 70+ year history of this prestigious journal. Our framework improves modeling…
New paper alert! We report that the Hessian of NNs has a very special structure: 1. it appears to be a "block-diagonal-block-circulant" matrix at initialization; 2. then it quickly evolves into a "near-block-diagonal" matrix along training. We then theoretically reveal two…
🚨 RL x LLM folks at #ICLR2025 — come join us during the Friday lunch break! If you haven’t RSVP’d on Whova, you can also register here: lu.ma/s8udv997?tk=B4… @Benjamin_eecs and I will scout for a chill spot (likely a corner at the venue) and share the location tomorrow.…
🎉 I'll be attending #ICLR2025 and the #Alignment Workshop in Singapore next week! If you're interested in LLMs, RL theory & algorithms, reasoning, optimization, and multimodal LLMs—let's connect! 🚀 I'll be presenting our latest research: 📌 Preserving Diversity in Supervised…
