Dongwei Jiang

@Dongwei__Jiang

Working on LLMs, focusing specifically on reasoning and self-improvement. Spent six years in my past life doing research in industry on speech processing

Baltimore, MD

Joined June 2022

793Following

511Followers

Pinned

Dongwei Jiang@Dongwei__Jiang · Jun 16

🧵 Recent studies show LLMs can self-improve their responses when given external feedback. But how effectively can they incorporate it? We tested this systematically—and found they can't fully integrate feedback, even when the feedback is high-quality and backed by ground-truth.

Dongwei__Jiang's tweet image. 🧵 Recent studies show LLMs can self-improve their responses when given external feedback. But how effectively can they incorporate it?

We tested this systematically—and found they can't fully integrate feedback, even when the feedback is high-quality and backed by ground-truth.

157

11.0K

Dongwei Jiang Retweeted

Lean@leanprover · Jun 20

Incredibly grateful to @TheOfficialACM SIGPLAN for awarding #LeanLang the Programming Languages Software Award 2025 at #PLDI2025! 🎉 "The Lean theorem prover is a remarkable software artifact... Lean has had and continues to have a broad impact on industrial practice and…

148

9.0K

Dongwei Jiang@Dongwei__Jiang · Jun 17

🚨🚨 New paper out with @Dongwei__Jiang and team: Even with near-perfect, ground-truth feedback, LLMs often fail to fully integrate it. We call this "feedback friction"—a key barrier to self-improvement. x.com/Dongwei__Jiang…

DDongwei Jiang@Dongwei__Jiang · Jun 16

2.0K

Dongwei Jiang Retweeted

Linxin Song@linxins2 · May 21

🚨 We discovered a surprising side effect of Reinforcement Finetuning (RFT): it makes LLMs more confidently wrong on unanswerable questions. We call this the hallucination tax: a drop in refusal behavior that leads to overconfident hallucinations. 🧵 1/n

259

250

28.0K

Dongwei Jiang Retweeted

Dongwei Jiang@Dongwei__Jiang · May 19

We've been thinking about this gap too! Our paper (arxiv.org/abs/2404.04298) found that when verifiable environments aren't available, LLMs aren't better at discriminating among previously generated alternatives than generating initial responses.

2.0K

Dongwei Jiang@Dongwei__Jiang · May 19

Now accepted by #ACL2025! Thrilled to see our paper also referenced in @lilianweng's latest blog post on reasoning in LLMs! Check it out: lilianweng.github.io/posts/2025-05-…

DDongwei Jiang@Dongwei__Jiang · Oct 3

Process supervision for reasoning is 🔥! While previous approaches often relied on human annotation and struggled to generalize across different reasoning tasks, we're now asking: Can we improve this? Introducing 𝐑𝐀𝐓𝐈𝐎𝐍𝐀𝐋𝐘𝐒𝐓: a new model pre-trained on implicit…

6.0K

Dongwei Jiang@Dongwei__Jiang · May 1

Excited to be presenting our paper on training language models under heavily imbalanced data tomorrow at #NAACL2025! If you want to chat about data curation for both pre- and post-training, feel free to reach out! 📝 arxiv.org/abs/2410.04579 📅 11-12:30am, Fri, May 2 📍 Hall 3

DDaniel Khashabi 🕊️@DanielKhashabi · Apr 28

"Upsample or Upweight? Balanced Training on Heavily Imbalanced Datasets" arxiv.org/abs/2410.04579 TLDR—When pre-training on imbalanced data, "Upsampling" and loss "Upweighting" are often assumed equivalent. (1)We show they behave differently. (2) Using this, we propose…

3.0K

Dongwei Jiang Retweeted

Jack Jingyu Zhang@jackjingyuzhang · Apr 30

Current copyright mitigation methods for LLMs typically focus on average-case risks, but overlook worst-case scenarios involving long verbatim copying ⚠️. We propose BloomScrub 🧽, a method providing certified mitigation of worst-case infringement while preserving utility.

2.0K

Dongwei Jiang Retweeted

Tanishq Abraham back from ICML@iScienceLuvr · Mar 25

Reasoning to Learn from Latent Thoughts "Motivated by how humans apply deliberate thinking to learn from limited data, we train an LM to infer (or “decompress”) latent thoughts underlying the highly compressed observed data. These synthesized latent thoughts augment the raw…

114

640

626

63.0K

Dongwei Jiang@Dongwei__Jiang · Mar 25

Verification, The Key to AI Read the archives of Rich Sutton, Turing Award winner :D, has all the major ideas

NNoam Brown@polynoamial · Mar 25

This isn't quite true. Test-time compute helps when verification is easier than generation (e.g., sudoku), but if the task is "When was George Washington born?" and you don't know, no amount of thinking will get you to the correct answer. You're bottlenecked by verification.

389

307

45.0K

Dongwei Jiang@Dongwei__Jiang · Feb 28

I'll be at #AAAI25 presenting my poster on Self-[In]Correct (arxiv.org/abs/2404.04298) during Session 3 on March 1st at 12:30. Would love to connect if you're attending!

2.0K