Sayash Kapoor

@sayashk

CS PhD candidate @PrincetonCITP and senior fellow at @Mozilla. I tweet about agents, evaluation, reproducibility, AI for science. Book: http://aisnakeoil.com

Princeton

Joined March 2015

2KFollowing

10KFollowers

Pinned

Sayash Kapoor@sayashk · Jul 17

The mainstream view of AI for science says AI will rapidly accelerate science, and that we're on track to cure cancer, double the human lifespan, colonize space, and achieve a century of progress in the next decade. In a new AI Snake Oil essay, @random_walker and I argue that…

sayashk's tweet image. The mainstream view of AI for science says AI will rapidly accelerate science, and that we're on track to cure cancer, double the human lifespan, colonize space, and achieve a century of progress in the next decade.

In a new AI Snake Oil essay, @random_walker and I argue that…

228

151

32.0K

Pinned

Sayash Kapoor Retweeted

Arvind Narayanan@random_walker · Jun 10

The origin story of “AI as Normal Technology”, and lessons learned Many people have asked how the “AI as Normal Technology” paper came to be. This paper has been an (ongoing) journey for me and @sayashk in developing not just the substance of our arguments but also learning how…

133

21.0K

Sayash Kapoor Retweeted

Avijit Ghosh@evijitghosh · Jul 15

New blog post alert! 🚨"What is the Hugging Face Community Building?", with @YJernite and @IreneSolaiman The AI narrative focuses on big players, but the real story is happening in the open source AI ecosystem across 1.8M models, 450K datasets, and 560K apps, on @huggingface.

5.0K

Sayash Kapoor Retweeted

Arvind Narayanan@random_walker · Jul 18

If we compared AI capabilities against humans with no access to tools, such as the internet, we would probably find that AI already outperformed humans at many or most cognitive tasks we perform at work. But of course this is not a helpful comparison and doesn’t tell us much…

119

13.0K

Sayash Kapoor@sayashk · Jul 17

In the running for my favorite blog post from Sayash and Arvind! When people ask me for areas I am most excited about for AI, my answer is often some version of "science-first AI-for-science". There is a lot to do to figure out what that means and how to do it well.

SSayash Kapoor@sayashk · Jul 17

6.0K

Sayash Kapoor Retweeted

Arvind Narayanan@random_walker · Jul 17

We ourselves are enthusiastic users of AI in our scientific workflows. On a day-to-day basis, it all feels very exciting. But the impact of AI on science as an institution, rather than individual scientists, is a different question that demands a different kind of analysis.…

6.0K

Sayash Kapoor Retweeted

Arvind Narayanan@random_walker · Jul 17

Some aspects of AI discourse seem to come from a different planet, oblivious to basic realities on Earth. AI for science is one such area. In this new essay, @sayashk and I argue that visions of accelerating science through AI should be considered unserious if they don't confront…

246

220

65.0K

Sayash Kapoor Retweeted

Kenny Peng@kennylpeng · Jul 3

Are LLMs correlated when they make mistakes? In our new ICML paper, we answer this question using responses of >350 LLMs. We find substantial correlation. On one dataset, LLMs agree on the wrong answer ~2x more than they would at random. 🧵(1/7)

211

163

18.0K

Sayash Kapoor@sayashk · Jul 10

After we invented the dynamo, it took us 40 years to electrify factories. In the process, we had to redesign the entire factory layout — electrifying existing factories didn't cut it. Software engineering will likewise need to undergo drastic changes to truly benefit from AI.…

MMETR@METR_Evals · Jul 10

We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.

195

27.0K

Sayash Kapoor Retweeted

METR@METR_Evals · Jul 10

236

1.0K

7.0K

3.0K

3.5M

Sayash Kapoor Retweeted

Steve Newman@snewmanpv · Jul 10

How much time do AI coding tools save? @METR_Evals just released a rigorous study with a startling result: developers take 19% longer to complete tasks when using AI! The result is consistent with the idea that AI tools are most helpful for routine work in small projects,…

8.0K

Sayash Kapoor Retweeted

Carnegie India@CarnegieIndia · Jul 10

🎙️ New #InterpretingIndia episode! @NidhiSinghLive joins @sayashk to explore the hype, hope, and hazards of artificial intelligence. From flawed predictions to grounded policymaking, Kapoor makes the case for treating #AI as “normal technology.” Tune in: carnegieindia.org/podcasts/inter…

720

Sayash Kapoor Retweeted

Daniel Kang@daniel_d_kang · Jul 8

As AI agents near real-world use, how do we know what they can actually do? Reliable benchmarks are critical but agentic benchmarks are broken! Example: WebArena marks "45+8 minutes" on a duration calculation task as correct (real answer: "63 minutes"). Other benchmarks…

20.0K

Sayash Kapoor Retweeted

Jordan McGillis@jordanmcgillis · Jun 30

AI tools can detect truck driver fatigue and prevent deadly crashes. But the Teamsters are blocking their rollout. My latest @WSJ:

278

72.0K

Sayash Kapoor Retweeted

Arvind Narayanan@random_walker · Jun 30

When coding with agents, my ideal GUI for context engineering would look like this. Key features: * Visually pick, resize, reorder what goes into the context. * The user is not forced to do all this manually; the agent is capable of auto-populating and the user can review/tweak.…

8.0K

Sayash Kapoor Retweeted

Arvind Narayanan@random_walker · Jun 9

A post by Stripe engineer @thegautam on building a successful payments foundation model for fraud detection recently went viral. I want to talk about how unusual this particular use case is, which helps understand why such "instant wins" from deploying advanced AI are so rare. As…

12.0K