Yu Su (@ysu_nlp)

Pinned

Y

Yu Su@ysu_nlp · Nov 13

Sharing the slides of my talk at Princeton yesterday--"A holistic and critical look at language agents": ysu1989.github.io/resources/lang… LLM-based language agents are exciting, but it's also undeniably a quite chaotic space: are agents the next big thing, or are they just thin wrappers…

ysu_nlp's tweet image. Sharing the slides of my talk at Princeton yesterday--"A holistic and critical look at language agents":

ysu1989.github.io/resources/lang…

LLM-based language agents are exciting, but it's also undeniably a quite chaotic space: are agents the next big thing, or are they just thin wrappers…

16

120

512

468

87.0K

Yu Su Retweeted

J

Jianyang Gu@vimar_gu · Jul 23

Announcing the @NeurIPSConf 2025 workshop on Imageomics: Discovering Biological Knowledge from Images Using AI! The workshop focuses on the interdisciplinary field between machine learning and biological science. We look forward to seeing you in San Diego! #NeurIPS2025

1

14

20

0

4.0K

Y

Yu Su@ysu_nlp · Jul 22

Impressive results. Can’t wait to try.

QQwen@Alibaba_Qwen · Jul 22

>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…

0

6

0

1.0K

Yu Su Retweeted

M

Multi-Turn Interaction LLM Workshop @ NeurIPS 2025@mti_neurips · Jul 21

🚀 Call for Papers — @NeurIPSConf 2025 Workshop Multi-Turn Interactions in LLMs 📅 December 6/7 · 📍 San Diego Convention Center Join us to shape the future of interactive AI. Topics include but are not limited to: 🧠 Multi-Turn RL for Agentic Tasks (e.g., web & GUI agents,…

2

23

101

52

28.0K

Yu Su Retweeted

A

Abraham Owodunni@AbrahamOwos · Jul 17

We’re thrilled to share our latest work: FLEXITOKENS! In this work, we introduce language models with learnable tokenizers for making tokenization truly flexible during adaptation. See example below ↓ 1/n

2

22

68

17

8.0K

Y

Yu Su@ysu_nlp · Jul 18

📢Check out this paper led by my amazing student, @AbrahamOwos, on making tokenizers more "flexible" during adaptation to tasks, domains, languages. There's been a lot of interest in removing BPE tokenizers from LLMs by directly (learning to) chunk byte sequences. All these…

AAbraham Owodunni@AbrahamOwos · Jul 17

We’re thrilled to share our latest work: FLEXITOKENS! In this work, we introduce language models with learnable tokenizers for making tokenization truly flexible during adaptation. See example below ↓ 1/n

0

6

20

3

2.0K

Yu Su Retweeted

B

Boyuan Zheng@ICML@boyuan__zheng · Jul 16

Attending #ICML2025 🇨🇦 this week! I’ll be co-organizing the Computer Use Agent Workshop @workshopcua on July 19th! Happy to chat about anything related to language agents — especially world modeling, scaling RL for agents, and multi-turn RL. Excited to meet old friends and…

2

6

47

1

3.0K

Y

Yu Su@ysu_nlp · Jul 15

Huan and I are looking for a postdoc to join us on agent research (broadly defined: planning, reasoning, safety, memory, continual learning, etc.). If you have a strong record in this space, drop us an email with CV! Retweet appreciated.

HHuan Sun (OSU)@hhsun1 · Jul 15

🚨 Postdoc Hiring: I am looking for a postdoc to work on rigorously evaluating and advancing the capabilities and safety of computer-use agents (CUAs), co-advised with @ysu_nlp @osunlp. We welcome strong applicants with experience in CUAs, long-horizon reasoning/planning,…

0

15

49

9

9.0K

Yu Su Retweeted

P

Percy Liang@percyliang · May 19

What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:

49

198

1.0K

424

147.0K

Y

Yu Su@ysu_nlp · Jul 8

Thrilled to announce that our work Online-Mind2Web has been accepted to @COLM_conf ! 🎉 It's my first PhD work and first paper at COLM. See you in Montreal! 🍁 Several teams are already testing their agents on Online-Mind2Web. If you're curious about how your agent performs, try…

TTianci Xue@xue_tianci · May 13

🚀Exciting update about our work! "An Illusion of Progress? Assessing the Current State of Web Agents." ✨ What’s New? 🆕 Claude Computer Use 3.7 performance analysis. 🆕 WebJudge, powered by o4-mini, achieves a remarkable 3.8% success rate gap with human judgment, demonstrating…

1

5

33

1

3.0K

Y

Yu Su@ysu_nlp · Jul 4

🧐Curious how far Claude Research can go in freeing you from tedious daily tasks? 🚀Check out our new results on Mind2Web 2! 💡 Looking forward to seeing even better agentic search systems! 🙌 Join the effort and test your system on Mind2Web 2 today!

YYu Su@ysu_nlp · Jun 27

🔎Agentic search like Deep Research is fundamentally changing web search, but it also brings an evaluation crisis⚠️ Introducing Mind2Web 2: Evaluating Agentic Search with Agents-as-a-Judge - 130 tasks (each requiring avg. 100+ webpages) from 1,000+ hours of expert labor -…

2

4

21

4

2.0K

Y

Yu Su@ysu_nlp · Jun 30

Our study led by @ChengleiSi reveals an “ideation–execution gap” 😲 Ideas from LLMs may sound novel, but when experts spend 100+ hrs executing them, they flop: 💥 👉 human‑generated ideas outperform on novelty, excitement, effectiveness & overall quality!

CCLS@ChengleiSi · Jun 30

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

5

25

151

52

25.0K

Yu Su Retweeted

R

Rohan Paul@rohanpaul_ai · Jun 28

Agentic search systems for web-scale information face an evaluation crisis due to their growing complexity and long, dynamic tasks. Mind2Web 2 provides a benchmark of 130 realistic, long-horizon tasks and a novel Agent-as-a-Judge framework to rigorously evaluate these systems.…

1

10

37

40

5.0K

Y

Yu Su@ysu_nlp · Jun 27

🧐Agentic search is revolutionizing how we gather information, but how reliable is it? Can it really deliver accurate answers with proper source attribution? 🚀Super excited to share our new work, Mind2Web 2, a rigorous agentic search benchmark with 130 realistic and…

YYu Su@ysu_nlp · Jun 27

🔎Agentic search like Deep Research is fundamentally changing web search, but it also brings an evaluation crisis⚠️ Introducing Mind2Web 2: Evaluating Agentic Search with Agents-as-a-Judge - 130 tasks (each requiring avg. 100+ webpages) from 1,000+ hours of expert labor -…

0

3

32

6

3.0K

Y

Yu Su@ysu_nlp · Jun 27

Rigorously evaluating agentic systems has been one of our pursuits at @osunlp, with prior efforts including Mind2Web and ScienceAgentBench. Today we introduce Mind2Web 2 to evaluate the emerging Deep Research-like agents: It features realistic and diverse long-horizon web…

YYu Su@ysu_nlp · Jun 27

🔎Agentic search like Deep Research is fundamentally changing web search, but it also brings an evaluation crisis⚠️ Introducing Mind2Web 2: Evaluating Agentic Search with Agents-as-a-Judge - 130 tasks (each requiring avg. 100+ webpages) from 1,000+ hours of expert labor -…

0

6

38

10

3.0K