Jiayi Geng

@JiayiiGeng

Incoming CS PhD @LTIatCMU | MSE @Princeton_nlp @PrincetonPLI @cocosci_lab @PrincetonCS. Working on Multi-agent / Cognitive science & LLMs

Princeton, NJ

Joined August 2022

200Following

1KFollowers

Pinned

Jiayi Geng@JiayiiGeng · May 27

Using LLMs to build AI scientists is all the rage now (e.g., Google’s AI co-scientist [1] and Sakana’s Fully Automated Scientist [2]), but how much do we understand about their core scientific abilities? We know how LLMs can be vastly useful (solving complex math problems) yet…

JiayiiGeng's tweet image. Using LLMs to build AI scientists is all the rage now (e.g., Google’s AI co-scientist [1] and Sakana’s Fully Automated Scientist [2]), but how much do we understand about their core scientific abilities?
We know how LLMs can be vastly useful (solving complex math problems) yet…

491

444

69.0K

Jiayi Geng@JiayiiGeng · Jul 24

Check out this cool video (made by @theryanliu) for our #icml25 paper, "Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse"🤗

RRyan Liu @ ICML, CogSci@theryanliu · Jul 24

A short 📹 explainer video on how LLMs can overthink in humanlike ways 😲! had a blast presenting this at #icml2025 🥳

910

Jiayi Geng@JiayiiGeng · Jul 17

In "Mind Your Step (by Step): Chain‑of‑Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse", we connect human "overthinking" insights to LLM reasoning, offering a new lens on when thinking‑out‑loud backfires. 📄 Read the full paper: arxiv.org/abs/2410.21333…

EEd H. Chi@edchi · Jul 17

One of the better posters I saw today at #icml25 This gets at the root of the problems we were thinking about when we conceived and wrote the CoT paper.

12.0K

Jiayi Geng Retweeted

Gaurav Ghosal@gaurav_ghosal · Jul 17

1/So much of privacy research is designing post-hoc methods to make models mem. free. It’s time we turn that around with architectural changes. Excited to add Memorization Sinks to the transformer architecture this #ICML2025 to isolate memorization during LLM training🧵

7.0K

Jiayi Geng@JiayiiGeng · Jul 17

🧐Check out our poster 11 am today @ West-320!

RRyan Liu @ ICML, CogSci@theryanliu · Jul 17

Chain of thought can hurt LLM performance 🤖 Verbal (over)thinking can hurt human performance 😵‍💫 Are when/why they happen similar? Come find out at our poster at West-320 ⏰11am tomorrow! #ICML2025

2.0K

Jiayi Geng Retweeted

Xiang Yue@xiangyue96 · Jul 2

People are racing to push math reasoning performance in #LLMs—but have we really asked why? The common assumption is that improving math reasoning should transfer to broader capabilities in other domains. But is that actually true? In our study (arxiv.org/pdf/2507.00432), we…

127

609

398

58.0K

Jiayi Geng Retweeted

Graham Neubig@gneubig · Jun 30

What will software development look like in 2026? With coding agents rapidly improving, dev roles may look quite different. My current workflow has changed a lot: - Work in github, not IDEs - Agents in parallel - Write English, not code - More code review Thoughts + a video👇

120

17.0K

Jiayi Geng Retweeted

CLS@ChengleiSi · Jun 30

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

170

599

204

139.0K

Jiayi Geng@JiayiiGeng · Jun 18

I'm thrilled to share that I've moved to Pittsburgh and joined NeuLab at CMU as a research intern this summer, advised by @gneubig! I'll also start my PhD @LTIatCMU this fall. Feel free to reach out if you're interested in chatting about multi-agent systems, LLMs for scientific…

370

39.0K

Jiayi Geng Retweeted

Yijia Shao@EchoShao8899 · Jun 12

🚨 70 million US workers are about to face their biggest workplace transmission due to AI agents. But nobody asks them what they want. While AI races to automate everything, we took a different approach: auditing what workers want vs. what AI can do across the US workforce.🧵

132

667

722

105.0K

Jiayi Geng Retweeted

Anthropic@AnthropicAI · Jun 13

New on the Anthropic Engineering blog: how we built Claude’s research capabilities using multiple agents working in parallel. We share what worked, what didn't, and the engineering challenges along the way. anthropic.com/engineering/bu…

726

4.0K

3.0K

1.8M

Jiayi Geng Retweeted

Yike Wang@yikewang_ · Jun 9

LLMs are helpful for scientific research — but will they continuously be helpful? Introducing 🔍ScienceMeter: current knowledge update methods enable 86% preservation of prior scientific knowledge, 72% acquisition of new, and 38%+ projection of future (arxiv.org/abs/2505.24302).

241

128

23.0K

Jiayi Geng Retweeted

CLS@ChengleiSi · May 30

This year, there have been various pieces of evidence that AI agents are starting to be able to conduct scientific research and produce papers end-to-end, at a level where some of these generated papers were already accepted by top-tier conferences/workshops. Intology’s…

220

36.0K

Jiayi Geng Retweeted

Jiahao Qiu@JiahaoQiu99 · May 27

The GAIA game is over, and Alita is the final answer. Alita takes the top spot in GAIA, outperforming OpenAI Deep Research and Manus. Many general-purpose agents rely heavily on large-scale, manually predefined tools and workflows. However, we believe that for general AI…

23.0K