Yu Feng (@AnnieFeng6)

Pinned

Y

Yu Feng@AnnieFeng6 · Apr 16

#ICLR2025 Oral LLMs often struggle with reliable and consistent decisions under uncertainty 😵‍💫 — largely because they can't reliably estimate the probability of each choice. We propose BIRD 🐦, a framework that significantly enhances LLM decision making under uncertainty. BIRD…

AnnieFeng6's tweet image. #ICLR2025 Oral

LLMs often struggle with reliable and consistent decisions under uncertainty 😵‍💫 — largely because they can't reliably estimate the probability of each choice.

We propose BIRD 🐦, a framework that significantly enhances LLM decision making under uncertainty.

BIRD…

2

39

259

184

28.0K

Yu Feng Retweeted

Z

Zhikun Xu@JerrryKun · Jun 17

🚀Excited to introduce BOW: A novel RL framework that rethinks vanilla next-word prediction as reasoning path exploration! Across 10 benchmarks, we show BOW leads to better zero-shot capabilities and next-word reasoning. 📄Paper: arxiv.org/pdf/2506.13502 🧵Details below

1

5

3

2.0K

Y

Yu Feng@AnnieFeng6 · Jun 16

👥 We’re looking for reviewers for the COLM 2025 Workshop on AI Agents: Capabilities & Safety @COLM_conf! 🔗 Sign up: forms.gle/5vHzyGxjUgSMNK… Help shape exciting research on AI agents, their capabilities, and the safety challenges they raise. 🧠 #AI #AIagents #COLM2025…

YYu Feng@AnnieFeng6 · May 27

🚨COLM 2025 Workshop on AI Agents: Capabilities and Safety @COLM_conf This workshop explores AI agents’ capabilities—including reasoning and planning, interaction and embodiment, and real-world applications—as well as critical safety challenges related to reliability, ethics,…

1

6

40

16

11.0K

Yu Feng Retweeted

J

Jeffrey (Young-Min) Cho@jeffrey_ch0 · May 29

🤖💬 Herding instincts… in AIs? Yes, even LLMs can follow the crowd! • 📉 Conformity ↑ when agents lack confidence but trust peers • 🧠 Presentation format shapes peer influence • 🎯 Controlled herding can boost collaboration outcomes 👉 Read more: arxiv.org/abs/2505.21588

0

7

12

1

931

Y

Yu Feng@AnnieFeng6 · May 27

🚨COLM 2025 Workshop on AI Agents: Capabilities and Safety @COLM_conf This workshop explores AI agents’ capabilities—including reasoning and planning, interaction and embodiment, and real-world applications—as well as critical safety challenges related to reliability, ethics,…

AnnieFeng6's tweet image. 🚨COLM 2025 Workshop on AI Agents: Capabilities and Safety @COLM_conf

This workshop explores AI agents’ capabilities—including reasoning and planning, interaction and embodiment, and real-world applications—as well as critical safety challenges related to reliability, ethics,…

4

16

83

36

23.0K

Yu Feng Retweeted

C

Cognitive Computation Group@cogcomp · Apr 23

Excited to share our papers at #ICLR2025 in Singapore! Check out the summaries on our blog (ccgblog.seas.upenn.edu/2025/04/ccg-pa…), and then check out the papers at oral session 1B (BIRD) and poster session 2 (for all three)! @AnnieFeng6, @XingyuFu2, @BenZhou96, @muhao_chen, @DanRothNLP

0

4

8

1

1.0K

Yu Feng Retweeted

J

Junlin Wang@JunlinWang3 · Apr 22

Excited to share work from my @togethercompute internship—a deep dive into inference‑time scaling methods 🧠 We rigorously evaluated verifier‑free inference-time scaling methods across both reasoning and non‑reasoning LLMs. Some key findings: 🔑 Even with huge rollout budgets,…

1

53

176

121

17.0K

Y

Yu Feng@AnnieFeng6 · Apr 19, 2024

Can GPT-4V and Gemini-Pro perceive the world the way humans do? 🤔 Can they solve the vision tasks that humans can in the blink of an eye? 😉 tldr; NO, they are far worse than us 💁🏻‍♀️ Introducing BLINK👁 zeyofu.github.io/blink/, a novel benchmark that studies visual perception…

AAK@_akhaliq · Apr 19, 2024

BLINK Multimodal Large Language Models Can See but Not Perceive We introduce Blink, a new benchmark for multimodal language models (LLMs) that focuses on core visual perception abilities not found in other evaluations. Most of the Blink tasks can be solved by humans

9

126

410

211

105.0K

Yu Feng Retweeted

A

AK@_akhaliq · Apr 19, 2024

BLINK Multimodal Large Language Models Can See but Not Perceive We introduce Blink, a new benchmark for multimodal language models (LLMs) that focuses on core visual perception abilities not found in other evaluations. Most of the Blink tasks can be solved by humans

4

91

367

166

113.0K