Dongkeun Yoon (@dongkeun_yoon)

Pinned

D

Dongkeun Yoon@dongkeun_yoon · May 21

🙁 LLMs are overconfident even when they are dead wrong. 🧐 What about reasoning models? Can they actually tell us “My answer is only 60% likely to be correct”? ❗Our paper suggests that they can! Through extensive analysis, we investigate what enables this emergent ability.

dongkeun_yoon's tweet image. 🙁 LLMs are overconfident even when they are dead wrong.

🧐 What about reasoning models? Can they actually tell us “My answer is only 60% likely to be correct”?

❗Our paper suggests that they can! Through extensive analysis, we investigate what enables this emergent ability.

9

50

299

266

31.0K

Dongkeun Yoon Retweeted

h

hyunji amy lee@hyunji_amy_lee · Jul 2

🥳Excited to share that I’ll be joining @unccs as postdoc this fall. Looking forward to work with @mohitban47 & amazing students at @unc_ai_group. I'll continue working on retrieval, aligning knowledge modules with LLM's parametric knowledge, and expanding to various modalities.

20

27

159

7

19.0K

Dongkeun Yoon Retweeted

R

Ricardo Rei@RicardoRei7 · Jun 23

🚀 Tower+: our latest model in the Tower family — sets a new standard for open-weight multilingual models! We show how to go beyond sentence-level translation, striking a balance between translation quality and general multilingual capabilities. 1/5 arxiv.org/pdf/2506.17080

1

8

24

8

2.0K

D

Dongkeun Yoon@dongkeun_yoon · Jun 23

Check out the latest iteration of Tower models, Tower+. Ideal for translation tasks and beyond, and available at three different scales: 2B, 9B, 72B. All available on huggingface: huggingface.co/collections/Un… Kudos to everyone involved!

RRicardo Rei@RicardoRei7 · Jun 23

🚀 Tower+: our latest model in the Tower family — sets a new standard for open-weight multilingual models! We show how to go beyond sentence-level translation, striking a balance between translation quality and general multilingual capabilities. 1/5 arxiv.org/pdf/2506.17080

0

1

10

0

458

Dongkeun Yoon Retweeted

h

hyunji amy lee@hyunji_amy_lee · Jun 19

🚨 Want models to better utilize and ground on the provided knowledge? We introduce Context-INformed Grounding Supervision (CINGS)! Training LLM with CINGS significantly boosts grounding abilities in both text and vision-language models compared to standard instruction tuning.

2

41

127

45

12.0K

Dongkeun Yoon Retweeted

S

Sohee Yang@soheeyang_ · Jun 13

🚨 New Paper 🧵 How effectively do reasoning models reevaluate their thought? We find that: - Models excel at identifying unhelpful thoughts but struggle to recover from them - Smaller models can be more robust - Self-reevaluation ability is far from true meta-cognitive awareness

3

26

116

55

7.0K

Dongkeun Yoon Retweeted

D

Dayoon Ko@dayoon12161 · Jun 3

🚨 Excited to share that our paper was accepted to #ACL2025 Findings 🎉 "When Should Dense Retrievers Be Updated in Evolving Corpora? Detecting Out-of-Distribution Corpora Using GradNormIR" Huge thanks to my amazing collaborators! 🙌 @jinyoung__kim @ohmyksh We propose…

0

7

38

0

3.0K

Dongkeun Yoon Retweeted

Y

Yunjae Won@Yunjae_Won_ · May 30

[1/6] Ever wondered why Direct Preference Optimization is so effective for aligning LLMs? 🤔 Our new paper dives deep into the theory behind DPO's success, through the lens of information gain. Paper: "Differential Information: An Information-Theoretic Perspective on Preference…

4

22

65

48

8.0K

Dongkeun Yoon Retweeted

S

Sheikh Shafayat@shafayat_sheikh · May 29

Check out our latest work on self-improving LLMs, where we try to see if LLMs can utilize their internal self consistency as a reward signal to bootstrap itself using RL. TL;DR: it can, to some extent, but then ends up reward hacking the self-consistency objective. We try to see…

4

24

143

71

11.0K

Dongkeun Yoon Retweeted

H

Hyeonbin Hwang@ronalhwang · May 29

🚨 New Paper co-led with @bkjeon1211 🚨 Q. Can we adapt Language Models, trained to predict next token, to reason in sentence-level? I think LMs operating in higher-level abstraction would be a promising path towards advancing its reasoning, and I am excited to share our…

4

44

169

137

14.0K

Dongkeun Yoon Retweeted

H

Hoyeon Chang@hoyeon_chang · May 27

New preprint 📄 (with @jinho___park ) Can neural nets really reason compositionally, or just match patterns? We present the Coverage Principle: a data-centric framework that predicts when pattern-matching models will generalize (validated on Transformers). 🧵👇

2

31

126

104

15.0K

Dongkeun Yoon Retweeted

a

arlo_son@gson_AI · May 26

Imagine you’re collaborating with an AI co-scientist: you ask it to proofread your manuscript and flag any errors. Which LLM would you choose? 🤔 We evaluated the new Claude 4 models on SPOT. It looks like o3 is still the best model for this.

2

5

8

1

1.0K

Dongkeun Yoon Retweeted

C

Chaeeun Kim@chaechaek1214 · May 24

❓What if your RAG didn’t need a separate retrieval model at all? We present 🧊FREESON, a new framework for retriever-FREE retrieval-augmented reasoning. With FREESON, a single LRM acts as both generator and retriever, shifting the focus from seq2seq matching to locating…

1

5

29

8

5.0K

D

Dongkeun Yoon@dongkeun_yoon · May 23

Congrats to the team for this fantastic work! Had a chance to try the code on my reasoning VLM and found consistent results. x.com/smellslikeml/s…

SSmells Like ML@smellslikeml · May 21

Tried out the code using SpaceThinker-Qwen2.5-VL-3B Plots indicate steady increase in accuracy and confidence while reducing calibration error as CoT increases Fitting TriviaQA with linear regression, slope: 0.042, -0.034, -0.02, 0.011 all statsig

1

0

296

Dongkeun Yoon Retweeted

f

fly51fly@fly51fly · May 21

[CL] Reasoning Models Better Express Their Confidence D Yoon, S Kim, S Yang, S Kim... [KAIST & CMU & UCL] (2025) arxiv.org/abs/2505.14489

0

5

19

11

1.0K

Dongkeun Yoon Retweeted

C

Connor Shorten@CShorten30 · May 21

🔥🔥🔥

1

0

461

D

Dongkeun Yoon@dongkeun_yoon · May 21

Turns out that reasoning models not only excel at solving problems but are also excellent confidence estimators - an unexpected side effect of long CoTs! This reminds me that smart ppl are good at determining what they know & don't know👀 Check out @dongkeun_yoon 's post!

DDongkeun Yoon@dongkeun_yoon · May 21

🙁 LLMs are overconfident even when they are dead wrong. 🧐 What about reasoning models? Can they actually tell us “My answer is only 60% likely to be correct”? ❗Our paper suggests that they can! Through extensive analysis, we investigate what enables this emergent ability.

0

1

17

1

964

D

Dongkeun Yoon@dongkeun_yoon · May 21

Reasoning models are quite verbose in their thinking process. Is it any good? We find out that it enables reasoning models to be more accurate in telling what they know and don’t know (confidence)! Even non-reasoning models can do it better if they mimic the verbose reasoning! 👀

DDongkeun Yoon@dongkeun_yoon · May 21

🙁 LLMs are overconfident even when they are dead wrong. 🧐 What about reasoning models? Can they actually tell us “My answer is only 60% likely to be correct”? ❗Our paper suggests that they can! Through extensive analysis, we investigate what enables this emergent ability.

1

11

82

31

8.0K