Sihao Chen
@soshsihao
Researcher @ Microsoft #OAR. Learning AI models from experience. Previously: @upennnlp @cogcomp @GoogleAI. Opnions my own.
Life update: I defended my Ph.D. thesis and have joined @Microsoft's Office of Applied Research (OAR)! One big takeaway from my Ph.D study is that -- research is all about translating ideas into impacts. I feel blessed to work with talented researchers who share the same values!


🤔 We know what people are using LLMs for, but do we know how they collaborate with an LLM? 🔍 In a recent paper we answered this by analyzing multi-turn sessions in 21 million Microsoft Copilot for consumers and WildChat interaction logs: arxiv.org/abs/2505.16023
We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between? Introducing Log-Linear Attention with: - Log-linear time training - Log-time inference (in both time and memory) - Hardware-efficient Triton kernels
Huge congrats @sharma_ashish_2!!👏👏
🎓 Congrats to Ashish Sharma, @UW on receiving the ACM Doctoral Dissertation Award for his dissertation, "Human-AI Collaboration to Support Mental Health and Well Being." 👏 Honorable Mentions: Alexander Kelley, @UofIllinois Sewon Min, @UCBerkeley
Missing nuance in the collective realization today: The non-trivial negative result is not that "RL just amplifies skills that are already there with low probability". Duh, that's obvious and not an issue actually. What got questioned today is that "dumb pretraining teaches the…
🔥🔥Let's start cooking 😎😎
Excited to share that I’ll be interning @Microsoft Office of Applied Research this summer, working on reinforcement finetuning with the awesome @soshsihao and @ylongqi. Seattle friends, let’s catch up and chat anything from alignment to inference-time scaling!
Join us if you want to work on the next-gen collaborative, socially- intelligent agents!
🙌We are looking for a full-time research scientist in Microsoft's Office of Applied Research! This person will help lead the development of next-generation AI technologies to support groups of people - not just individuals - in getting their work done.
LLMs naturally memorize some verbatim of pre-training data. We study whether post-training can be an effective way to mitigate unintentional reproduction of pre-training data. 🛠️ No changes to pre-training or decoding 🔥 Training models to latently distinguish between memorized…
This is so true. LLM researchers seem to like to "specialize" in either pretraining or post training. Doing intense research on both sides does unlock something.
No LLM researcher should spent their whole life on one side of the pre/post training divide. The former teaches you what is actually happening, the latter reminds you what actually matters.
Want to 𝐜𝐮𝐭 𝐑𝐅𝐓 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐭𝐢𝐦𝐞 𝐛𝐲 𝐮𝐩 𝐭𝐨 𝟐× and boost performance? 🚀 Meet 𝑨𝒅𝒂𝑹𝑭𝑻 — a lightweight, plug-and-play curriculum learning method you can drop into any mainstream RFT algorithms (PPO, GRPO, REINFORCE). Less compute. Better results. 🧵 1/n
🚀 How well can LLMs know you and personalize your response? Turns out, not so much! Introducing the PersonaMem Benchmark -- 👩🏻💻Evaluate LLM's ability to understand evolving persona from 180+ multi-session user-chatbot conversation history 🎯Latest models (GPT-4.1, GPT-4.5,…
Will be at #NAACL this week. Let's talk if you are interested in RL, agents, and LLM post training in general!
Heading to #NAACL2025 w/ @peizNLP @soshsihao We are hiring full-time scientists on LLM post training, long-context reasoning, agents, and reinforcement finetuning. Please reach out if you are interested in chatting at the conference!
#ICLR2025 Oral LLMs often struggle with reliable and consistent decisions under uncertainty 😵💫 — largely because they can't reliably estimate the probability of each choice. We propose BIRD 🐦, a framework that significantly enhances LLM decision making under uncertainty. BIRD…
📢 𝐖𝐢𝐥𝐝𝐅𝐞𝐞𝐝𝐛𝐚𝐜𝐤 A large-scale preference dataset built from 𝐫𝐞𝐚𝐥 𝐮𝐬𝐞𝐫 interactions with ChatGPT ✅ 𝟐𝟎𝐤+ preference pairs 🗣️ Built from 𝟏𝐌 chats 🔍 Annotated with 𝐝𝐢𝐚𝐥𝐨𝐠𝐮𝐞 𝐬𝐭𝐚𝐭𝐞, 𝐝𝐨𝐦𝐚𝐢𝐧, 𝐢𝐧𝐭𝐞𝐧𝐭, and more huggingface.co/datasets/micro…
🤖 Tired of slow tree searches on LLMs? 🚀 Check out our latest research on efficient tree search! 🔹 We introduce an upgraded transformer architecture that enables token-level self-reward modeling (TRM). 🔹 On top of that, we developed the Streaming Looking Ahead (SLA)…