Simon Shaolei Du (@SimonShaoleiDu)

Pinned

S

Simon Shaolei Du@SimonShaoleiDu · Jul 14

Can transformers analyze code efficiently? ✅ Yes. We prove transformers efficiently handle real compiler tasks (AST construction, symbol resolution, type infer) using only log size—while RNNs require linear size (in input length). Paper: arxiv.org/abs/2410.14706 #COLM2025

4

58

362

236

31.0K

Pinned

S

Simon Shaolei Du@SimonShaoleiDu · Jun 23

🚨 Code is live! Check out LoRe – a modular, lightweight codebase for personalized reward modeling from user preferences. 📦 Few-shot personalization 📊 Benchmarks: TLDR, PRISM, PersonalLLM 👉 github.com/facebookresear… Huge thanks to @AIatMeta for open-sourcing this research 🙌

AAvinandan Bose@avibose22 · Apr 22

🧠 Your LLM should model how you think, not reduce you to preassigned traits 📢 Introducing LoRe: a low-rank reward modeling framework for personalized RLHF ❌ Demographic grouping/handcrafted traits ✅ Infers implicit preferences ✅ Few-shot adaptation 📄 arxiv.org/abs/2504.14439

0

6

21

6

4.0K

Pinned

S

Simon Shaolei Du@SimonShaoleiDu · Jun 14

I'll present StoryEval tomorrow at CVPR, happy to catch up with new and old friends! 📍ExHall D, Poster #284 ⌚10.30am - 12.30 pm at 6.14

YYiping Wang@ypwang61 · Jan 8

Can the current best T2V generative models (Veo2, Kling, Sora, Gen-3, Pika, Hailuo, ...) completely present short stories like “How to Put an Elephant in a Refrigerator”? 🐘 Not yet! Simple stories containing multiple sequential events, such as “opens the refrigerator door” 🚪,…

0

3

18

2

2.0K

Pinned

S

Simon Shaolei Du@SimonShaoleiDu · Apr 30

Excited to share our work led by @ypwang61 RLVR with only ONE training example can boost 37% accuracy on MATH500.

YYiping Wang@ypwang61 · Apr 30

We only need ONE example for RLVR on LLMs to achieve significant improvement on math tasks! 📍RLVR with one training example can boost: - Qwen2.5-Math-1.5B: 36.0% → 73.6% - Qwen2.5-Math-7B: 51.0% → 79.2% on MATH500. 📄 Paper: arxiv.org/abs/2504.20571…

2

6

49

10

8.0K

Pinned

Simon Shaolei Du Retweeted

A

Avinandan Bose@avibose22 · Apr 22

🧠 Your LLM should model how you think, not reduce you to preassigned traits 📢 Introducing LoRe: a low-rank reward modeling framework for personalized RLHF ❌ Demographic grouping/handcrafted traits ✅ Infers implicit preferences ✅ Few-shot adaptation 📄 arxiv.org/abs/2504.14439

2

26

112

73

18.0K

Simon Shaolei Du Retweeted

P

Paresh Chaudhary@pareshrc · Jun 28

1/6 Current AI agent training methods fail to capture diverse behaviors needed for human-AI cooperation. GOAT (Generative Online Adversarial Training) uses online adversarial training to explore a pre-trained generative model's latent space to generate realistic yet challenging…

1

7

17

13

9.0K

S

Simon Shaolei Du@SimonShaoleiDu · Jun 12

Check out our new work using online multi-agent RL for LM safety.

MMickel Liu@mickel_liu · Jun 11

🤔Conventional LM safety alignment is reactive: find vulnerabilities→patch→repeat 🌟We propose 𝗼𝗻𝗹𝗶𝗻𝗲 𝐦𝐮𝐥𝐭𝐢-𝐚𝐠𝐞𝐧𝐭 𝗥𝗟 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 where Attacker & Defender self-play to co-evolve, finding diverse attacks and improving safety by up to 72% vs. RLHF 🧵

1

2

20

4

2.0K

S

Simon Shaolei Du@SimonShaoleiDu · Jun 9

Oral @icmlconf !!! Can't wait to share our work and hear the community's thoughts on it, should be a fun talk! Can't thank my collaborators enough: @cogscikid @liangyanchenggg @SimonShaoleiDu @maxhkw @natashajaques

KKunal Jha@kjha02 · Apr 18

Our new paper (first one of my PhD!) on cooperative AI reveals a surprising insight: Environment Diversity > Partner Diversity. Agents trained in self-play across many environments learn cooperative norms that transfer to humans on novel tasks. shorturl.at/fqsNN🧵

0

3

50

8

6.0K

Simon Shaolei Du Retweeted

A

Allen School@uwcse · Jun 4

Congratulations to @UW #UWAllen Ph.D. grads @sharma_ashish_2 & @sewon__min, @TheOfficialACM Doctoral Dissertation Award honorees! Sharma won for #AI tools for mental health; Min received honorable mention for efficient, flexible language models. #ThisIsUW news.cs.washington.edu/2025/06/04/all…

1

19

103

5

31.0K

S

Simon Shaolei Du@SimonShaoleiDu · May 28

PPO vs. DPO? 🤔 Our new paper proves that it depends on whether your models can represent the optimal policy and/or reward. Paper: arxiv.org/abs/2505.19770 Led by @smellycat_ZZZ @MinhakSong

RRuizhe Shi@smellycat_ZZZ · May 28

Two-stage RLHF or one-stage DPO: Which one is better for learning from preferences? Equal under strong assumptions, but representation differences break the tie. Our paper reveals their fine-grained performance gaps under various conditions. paper: arxiv.org/abs/2505.19770

0

18

98

74

11.0K

S

Simon Shaolei Du@SimonShaoleiDu · May 27

Our new paper tries to uncover what we really need in applying RLVR.

SStella Li@StellaLisy · May 27

🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…

0

19

5

3.0K

S

Simon Shaolei Du@SimonShaoleiDu · May 23

Even with the same vision encoder, generative VLMs (LLaVA) can extract more information than CLIP. Why? Check out our #ACL2025NLP paper led by @SitingLi627 : arxiv.org/pdf/2411.05195

SSiting Li@SitingLi627 · May 22

Excited to share that our paper "Exploring How Generative MLLMs Perceive More Than CLIP with the Same Vision Encoder" is accepted to #ACL2025! Preprint: arxiv.org/pdf/2411.05195 Thank @SimonShaoleiDu and @PangWeiKoh so much for your support and guidance throughout the journey!

1

2

17

5

3.0K

Simon Shaolei Du Retweeted

S

Shane Gu@shaneguML · May 13

Famous LLM researcher Bruce Lee quote: "I fear not the LLM who has practiced 10,000 questions once, but I fear the LLM who has practiced one question 10,000 times."

22

87

685

376

49.0K

S

Simon Shaolei Du@SimonShaoleiDu · May 1

So excited to announce our work was accepted as a Spotlight paper to @icmlconf !!! I'm looking forward to presenting our work there this summer and @cogsci_soc! Big thank you again to my collaborators @cogscikid @liangyanchenggg @SimonShaoleiDu @maxhkw @natashajaques

KKunal Jha@kjha02 · Apr 18

Our new paper (first one of my PhD!) on cooperative AI reveals a surprising insight: Environment Diversity > Partner Diversity. Agents trained in self-play across many environments learn cooperative norms that transfer to humans on novel tasks. shorturl.at/fqsNN🧵

3

10

68

12

10.0K

S

Simon Shaolei Du@SimonShaoleiDu · Apr 21

Sampler is crucial for faster convergence of online DPO! Check out out paper: arxiv.org/abs/2409.19605 #ICLR2025

RRuizhe Shi@smellycat_ZZZ · Apr 21

Previous works study the sample complexity of DPO and emphasize the role of samplers in online DPO. What about its role in optimization convergence rates? Check out our paper at #ICLR2025 on convergence rates of online DPO with various samplers! ArXiv: arxiv.org/pdf/2409.19605.

0

3

23

5

3.0K

S

Simon Shaolei Du@SimonShaoleiDu · Apr 19

Excited to share our new work led by @kjha02 : scaling training to more diverse environments is key to human-AI cooperation!

KKunal Jha@kjha02 · Apr 18

Our new paper (first one of my PhD!) on cooperative AI reveals a surprising insight: Environment Diversity > Partner Diversity. Agents trained in self-play across many environments learn cooperative norms that transfer to humans on novel tasks. shorturl.at/fqsNN🧵

0

16

2

2.0K