Liliang Ren
@liliang_ren
Senior Researcher at Microsoft GenAI | UIUC CS PhD graduate | Efficient LLM | NLP | Former Intern @MSFTResearch @Azure @AmazonScience
We’re open-sourcing the pre-training code for Phi4-mini-Flash, our SoTA hybrid model that delivers 10× faster reasoning than Transformers — along with μP++, a suite of simple yet powerful scaling laws for stable large-scale training. 🔗 github.com/microsoft/Arch… (1/4)
See our work in the workshop today. If you are looking for opportunities to work on efficient model architecture or whatever to make the training or inference run much faster with thousands or more gpus, please come to talk to us or dm me. We are hiring.
We’re open-sourcing the pre-training code for Phi4-mini-Flash, our SoTA hybrid model that delivers 10× faster reasoning than Transformers — along with μP++, a suite of simple yet powerful scaling laws for stable large-scale training. 🔗 github.com/microsoft/Arch… (1/4)
We are hiring! If you are interested in efficient architecture or making training and inference on thousands of GPUs much faster, please feel free to dm me or @WeizhuChen! We are doing RL on very large scales!
We’re open-sourcing the pre-training code for Phi4-mini-Flash, our SoTA hybrid model that delivers 10× faster reasoning than Transformers — along with μP++, a suite of simple yet powerful scaling laws for stable large-scale training. 🔗 github.com/microsoft/Arch… (1/4)
Just arrived at ICML. Please drop me a message if you are here and like to chat. We are hiring.
🎉 Excited to share that our paper "Pretrained Hybrids with MAD Skills" was accepted to @COLM_conf 2025! We introduce Manticore - a framework for automatically creating hybrid LMs from pretrained models without training from scratch. 🧵[1/n]
Microsoft just dropped Phi-4-mini-flash-reasoning. - built on a new hybrid architecture, - 10X higher throughput and a 2 to 3X reduction in latency - significantly faster inference without sacrificing reasoning performance. Microsoft swaps most of that heavy work for a lean…
Meet Phi-4-mini-flash-reasoning: a fast, low-latency SLM built for scale with its novel SambaY architecture. Available on Azure AI Foundry and Hugging Face. Experience advanced reasoning capabilities here: msft.it/6018SAmHn
Excited to see the next-gen tokenizer-free model that can filter out redundancy in sequences efficiently (?)
I converted one of my favorite talks I've given over the past year into a blog post. "On the Tradeoffs of SSMs and Transformers" (or: tokens are bullshit) In a few days, we'll release what I believe is the next major advance for architectures.
Research with amazing collaborators @JizeJiang, @MeitangLi, and @JingchengYang, guided by great advisors and supported by the generous help of talented researchers @BowenJin13, @XingyuFu2, and many open-source contributors (easyr1, verl, vllm... etc).
Excited to introduce VTool-R1! We’ve trained VLMs to “think visually” using RL, blending Python-based 🖼️visual edits with💡textual Chain-of-Thought reasoning. Our trained qwen2.5-VL-32B surpasses GPT-4o on ChartQA & TableVQA, and even the compact qwen2.5-VL-7B significantly…