Taiwei Shi

@taiwei_shi

AI Researcher & Ph.D. student @nlp_usc. Intern @MSFTResearch. Formerly @GeorgiaTech @USC_ISI. NLP & Computational Social Science.

Los Angeles, CA

Joined November 2014

394Following

1KFollowers

Pinned

Taiwei Shi@taiwei_shi · May 2

Want to 𝐜𝐮𝐭 𝐑𝐅𝐓 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐭𝐢𝐦𝐞 𝐛𝐲 𝐮𝐩 𝐭𝐨 𝟐× and boost performance? 🚀 Meet 𝑨𝒅𝒂𝑹𝑭𝑻 — a lightweight, plug-and-play curriculum learning method you can drop into any mainstream RFT algorithms (PPO, GRPO, REINFORCE). Less compute. Better results. 🧵 1/n

taiwei_shi's tweet image. Want to 𝐜𝐮𝐭 𝐑𝐅𝐓 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐭𝐢𝐦𝐞 𝐛𝐲 𝐮𝐩 𝐭𝐨 𝟐× and boost performance? 🚀

Meet 𝑨𝒅𝒂𝑹𝑭𝑻 — a lightweight, plug-and-play curriculum learning method you can drop into any mainstream RFT algorithms (PPO, GRPO, REINFORCE).

Less compute. Better results. 🧵 1/n

360

361

42.0K

Taiwei Shi Retweeted

Longqi Yang@ylongqi · 3 h

Landed in Vienna for #ACL2025! We are hiring FTEs/Postdocs/Interns at Office of Applied Research to push the frontier of continuous model improvement for productivity, through RL*, inference time scaling, self reflection, memory etc. Available to chat this week w/ @mengtingwan.

487

Taiwei Shi@taiwei_shi · Jul 23

CoT transformed text reasoning. What about multimodal? 🤔 Check out our new dataset of interleaved text and image reasoning traces. We also show interesting visual CoT examples generated inherently by the model finetuned on our dataset!

MMicah Goldblum@micahgoldblum · Jul 23

🚨Announcing Zebra-CoT, a large-scale dataset of high quality interleaved image-text reasoning traces 📜. Humans often draw visual aids like diagrams when solving problems, but existing VLMs reason mostly in pure text. 1/n

895

Taiwei Shi@taiwei_shi · Jul 21

OpenAI over the years: 2022: Publishes papers in top-tier conferences 2023: Releases technical reports on arXiv 2024: Posts random blogs on its website 2025: "TRUST ME BRO!"

OOpenAI@OpenAI · Jul 19

We achieved gold medal-level performance 🥇on the 2025 International Mathematical Olympiad with a general-purpose reasoning LLM! Our model solved world-class math problems—at the level of top human contestants. A major milestone for AI and mathematics.

3.0K

Taiwei Shi Retweeted

Johnny Tian-Zheng Wei@johntzwei · Jul 10

Are you a researcher, trying to build a small GPU cluster? Did you already build one, and it sucks? I manage USC NLP’s GPU cluster and I’m happy to offer my expertise. I hope I can save you some headaches and make some friends. Please reach out!

8.0K

Taiwei Shi@taiwei_shi · Jul 8

Our paper on 𝐒𝐭𝐨𝐜𝐡𝐚𝐬𝐭𝐢𝐜 𝐄𝐫𝐫𝐨𝐫 𝐀𝐬𝐜𝐞𝐧𝐭 (𝐒𝐄𝐀) has been accepted to #COLM2025! 🎉 We introduce a scalable framework for uncovering LLM knowledge gaps with remarkable efficiency. Read more 👇 📄 Paper: arxiv.org/abs/2503.23361 💻 Code: github.com/limenlp/SEA

LLinxin Song@linxins2 · Apr 1

Want to know what your LLM don’t know? This is how 👇 Preprint: arxiv.org/abs/2503.23361 Code: github.com/uscnlp-lime/SEA

122

11.0K

Taiwei Shi@taiwei_shi · Jun 30

Happy to have contributed to this research that brings replacing me as a researcher one step closer. 😂

CCLS@ChengleiSi · Jun 30

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

514

Taiwei Shi@taiwei_shi · Jun 16

Glad to see multiple efforts highlighting the challenge of LLMs hallucinating on unanswerable math problems and the importance of abstention. Just a quick correction—we're from USC 😆.

PPolina Kirichenko@polkirichenko · Jun 16

Our results also align with concurrent work from UCLA @linxins2 @taiwei_shi @jieyuzhao11 which also observed reasoning LLMs hallucinate on unanswerable math problems! x.com/linxins2/statu… More evidence to argue that hallucination and failure to abstain is a big challenge in…

474

Taiwei Shi Retweeted

Dongwei Jiang@Dongwei__Jiang · Jun 16

🧵 Recent studies show LLMs can self-improve their responses when given external feedback. But how effectively can they incorporate it? We tested this systematically—and found they can't fully integrate feedback, even when the feedback is high-quality and backed by ground-truth.

159

11.0K

Taiwei Shi Retweeted

Omar Shaikh@oshaikh13 · Jun 9

What if LLMs could learn your habits and preferences well enough (across any context!) to anticipate your needs? In a new paper, we present the General User Model (GUM): a model of you built from just your everyday computer use. 🧵

336

197

57.0K

Taiwei Shi Retweeted

Marktechpost AI Dev News ⚡@Marktechpost · Jun 6

Teaching AI to Say ‘I Don’t Know’: A New Dataset Mitigates Hallucinations from Reinforcement Finetuning Researchers from the University of Southern California developed the Synthetic Unanswerable Math (SUM) dataset. SUM introduces implicitly unanswerable math problems by…

3.0K

Taiwei Shi@taiwei_shi · Jun 4

Can your LLM truly understand and adapt to 𝑐𝑙𝑎𝑠ℎ𝑖𝑛𝑔 𝑐𝑜𝑚𝑚𝑢𝑛𝑖𝑡𝑦 𝑛𝑜𝑟𝑚𝑠? Introducing 𝐒𝐓𝐄𝐄𝐑-𝐁𝐄𝐍𝐂𝐇 🧭 — a large-scale benchmark to test 𝐬𝐭𝐞𝐞𝐫𝐚𝐛𝐢𝐥𝐢𝐭𝐲 across 30 highly contrasting online communities.

KKai Chen@kaichen23 · Jun 4

🤔How well do LLMs adapt to different norms? 🧵We introduce STEER-BENCH, a benchmark for assessing steerability in LLMs. 📉 Human: 81% | Top LLM: ~65% 🚨 Norm alignment ≠ solved. 📄 Paper: arxiv.org/abs/2505.20645 @ZihaoHe95 @taiwei_shi @KristinaLerman

952

Taiwei Shi@taiwei_shi · May 27

Is there anything that Qwen cannot do at this point? 😂

1.0K

205

86.0K

Taiwei Shi@taiwei_shi · May 23

Excited to share that I’ll be interning @Microsoft Office of Applied Research this summer, working on reinforcement finetuning with the awesome @soshsihao and @ylongqi. Seattle friends, let’s catch up and chat anything from alignment to inference-time scaling!

taiwei_shi's tweet image. Excited to share that I’ll be interning @Microsoft Office of Applied Research this summer, working on reinforcement finetuning with the awesome @soshsihao and @ylongqi. Seattle friends, let’s catch up and chat anything from alignment to inference-time scaling!

5.0K

Taiwei Shi@taiwei_shi · May 19

Now accepted by #ACL2025! Thrilled to see our paper also referenced in @lilianweng's latest blog post on reasoning in LLMs! Check it out: lilianweng.github.io/posts/2025-05-…

DDongwei Jiang@Dongwei__Jiang · Oct 3

Process supervision for reasoning is 🔥! While previous approaches often relied on human annotation and struggled to generalize across different reasoning tasks, we're now asking: Can we improve this? Introducing 𝐑𝐀𝐓𝐈𝐎𝐍𝐀𝐋𝐘𝐒𝐓: a new model pre-trained on implicit…

6.0K