CLS (@ChengleiSi)

Pinned

C

CLS@ChengleiSi · Jun 30

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

ChengleiSi's tweet image. Are AI scientists already better than human researchers?

We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts.

Main finding: LLM ideas result in worse projects than human ideas.

11

169

595

202

138.0K

Pinned

CLS Retweeted

A

Agentica Project@Agentica_ · Jul 2

🚀 Introducing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. 💪DeepSWE…

15

67

349

222

59.0K

CLS Retweeted

A

Adam D'Angelo@adamdangelo · Jul 21

IOI 2025 is next week. How many AI teams will get a gold medal?

5

3

20

3

7.0K

CLS Retweeted

J

Jiaxin Pei@jiaxin_pei · Jul 21

Life Update: I will join @UTiSchool as an Assistant Professor in Fall 2026 and will continue my work on LLM, HCI, and Computational Social Science. I'm building a new lab on Human-Centered AI Systems and will be hiring PhD students in the coming cycle!

43

8

264

32

15.0K

CLS Retweeted

L

Lucy Li@lucy3_li · Jul 21

I'm sadly not at #IC2S2 😭, but I will be at #ACL2025 in Vienna ☕️ next week!! Please spread the word that I'm recruiting prospective PhD students: lucy3.notion.site/for-prospectiv…

1

11

125

67

11.0K

C

CLS@ChengleiSi · Jul 19

Watching the model solve these IMO problems and achieve gold-level performance was magical. A few thoughts 🧵

AAlexander Wei@alexwei_ · Jul 19

1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

76

130

2.0K

347

549.0K

C

CLS@ChengleiSi · Jul 12

TL;DR: When you add a system prompt asking the model to act "based", it might act based.

GGrok@grok · Jul 12

Update on where has @grok been & what happened on July 8th. First off, we deeply apologize for the horrific behavior that many experienced. Our intent for @grok is to provide helpful and truthful responses to users. After careful investigation, we discovered the root cause…

1

3

28

4

5.0K

CLS Retweeted

D

Deniz Kavi@kavi_deniz · Jul 11

When are AI-designed drugs making it to patients? No AI-designed medicine is on pharmacy shelves yet, but the first wave of molecules now in Phase 2/early Phase 3 (chiefly rentosertib for IPF). Here's where we are, a thread👇

12

51

285

237

24.0K

CLS Retweeted

J

Johnny Tian-Zheng Wei@johntzwei · Jul 10

Are you a researcher, trying to build a small GPU cluster? Did you already build one, and it sucks? I manage USC NLP’s GPU cluster and I’m happy to offer my expertise. I hope I can save you some headaches and make some friends. Please reach out!

4

11

88

40

7.0K

C

CLS@ChengleiSi · Jul 9

Can data owners & LM developers collaborate to build a strong shared model while each retaining data control? Introducing FlexOlmo💪, a mixture-of-experts LM enabling: • Flexible training on your local data without sharing it • Flexible inference to opt in/out your data…

AAi2@allen_ai · Jul 9

Introducing FlexOlmo, a new paradigm for language model training that enables the co-development of AI through data collaboration. 🧵

9

82

269

86

51.0K

CLS Retweeted

X

Xiang Yue@xiangyue96 · Jul 2

People are racing to push math reasoning performance in #LLMs—but have we really asked why? The common assumption is that improving math reasoning should transfer to broader capabilities in other domains. But is that actually true? In our study (arxiv.org/pdf/2507.00432), we…

15

125

609

397

58.0K

CLS Retweeted

J

John Bohannon -- see you @ICML!@bohannon_bot · Jul 1

July 4th break in our #AI4Science seminar series. Join us next week for a talk by @ChengleiSi on the epic 2-year experiment evaluating (and executing!) AI-generated scientific ideas. lu.ma/9qq72ebt

0

12

21

4

3.0K

CLS Retweeted

a

alphaXiv@askalphaxiv · Jul 1

LLMs can generate research ideas that look more novel than humans’, but are they actually better? Stanford ran a study where LLM- or human-authored ideas were tested Human ideas were blindly rated consistently better, with LLM ideas seeing 37× larger score drops post-execution

6

41

199

99

12.0K

C

CLS@ChengleiSi · Jul 1

ChengLei has the most creative research projects: he made PhDs execute AI research ideas for months

CCLS@ChengleiSi · Jun 30

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

0

1

6

1

2.0K

C

CLS@ChengleiSi · Jul 1

Amazing follow-up work! After finding that AI research ideas were judged (by human experts) better than human ideas... They tested it by actually executing the research projects! Turns out human ideas are better (judges were wrong!) – but only narrowly & not statistically…

CCLS@ChengleiSi · Jun 30

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

0

4

22

4

3.0K

C

CLS@ChengleiSi · Jun 30

“Finally, maybe this is controversial but ultimately progress in science is bottlenecked by real-world experiments.” If this is controversial in SF, we’re cooked.

JJason Wei@_jasonwei · Jun 30

We don’t have AI self-improves yet, and when we do it will be a game-changer. With more wisdom now compared to the GPT-4 days, it's obvious that it will not be a “fast takeoff”, but rather extremely gradual across many years, probably a decade. The first thing to know is that…

9

22

231

41

73.0K

C

CLS@ChengleiSi · Jun 30

Can AI ideas hold up in the lab? This study from Stanford says not as well as human ones, but there's hope. With enough training/reasoning, I'm pretty sure LLMs could nail 'small-scale discoveries' not Nobel stuff though Great work @ChengleiSi @tatsu_hashimoto @Diyi_Yang

CCLS@ChengleiSi · Jun 30

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

1

8

32

13

13.0K

C

CLS@ChengleiSi · Jun 30

guys LLMs are trailing by barely half a peer-review point at doing research

CCLS@ChengleiSi · Jun 30

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

0

3

31

6

5.0K

CLS Retweeted

R

Rohan Paul@rohanpaul_ai · Jun 29

LLM research ideas look shiny on paper but slip when someone actually builds the project. This 131-page work checks whether those projects still look strong once experts run every experiment. It shows a clear drop in quality for the LLM ideas, which means judging ideas only at…

1

7

45

28

5.0K