Peter Jansen ( @peterjansen-ai.bsky.social )

@peterjansen_ai

Associate Professor @uarizona; Visiting Scientist @allen_ai, AI/NLP; DiscoveryWorld; EntailmentBank; ScienceWorld; http://textgames.org list. Tweets/opinions my own

Tucson, AZ

Joined March 2018

657Following

2KFollowers

Pinned

Peter Jansen ( @peterjansen-ai.bsky.social )@peterjansen_ai · Oct 7

Can language models perform end-to-end scientific discovery? In our NeurIPS Spotlight paper, we show: very rarely. Our best model found <20% of discoveries, our best PhDs found nearly all. Paper: arxiv.org/pdf/2406.06769 Code/Web: allenai.github.io/discoveryworld @allen_ai @MSFTResearch

peterjansen_ai's tweet image. Can language models perform end-to-end scientific discovery? In our NeurIPS Spotlight paper, we show: very rarely.

Our best model found &lt;20% of discoveries, our best PhDs found nearly all.

Paper: arxiv.org/pdf/2406.06769
Code/Web: allenai.github.io/discoveryworld
@allen_ai @MSFTResearch

104

497

357

43.0K

Pinned

Peter Jansen ( @peterjansen-ai.bsky.social ) Retweeted

Ai2@allen_ai · Jun 30

🚨 We're hiring a #ResearchScientist in #AI for Scientific Discovery at Ai2! Are you passionate about intelligent agents, data-driven discovery, and AI systems that accelerate science? Join us in shaping the future of research. 🧬🧠 Apply now: job-boards.greenhouse.io/thealleninstit…

5.0K

Peter Jansen ( @peterjansen-ai.bsky.social ) Retweeted

Kenny Peng@kennylpeng · Jul 3

Are LLMs correlated when they make mistakes? In our new ICML paper, we answer this question using responses of >350 LLMs. We find substantial correlation. On one dataset, LLMs agree on the wrong answer ~2x more than they would at random. 🧵(1/7)

211

163

18.0K

Peter Jansen ( @peterjansen-ai.bsky.social ) Retweeted

Ai2@allen_ai · Jul 16

We’ve upgraded ScholarQA, our agent that helps researchers conduct literature reviews efficiently by providing detailed answers. Now, when ScholarQA cites a source, it won’t just tell you which paper it came from–you’ll see the exact quote, highlighted in the original PDF. 🧵

198

17.0K

Peter Jansen ( @peterjansen-ai.bsky.social )@peterjansen_ai · Jul 15

Honored to get the outstanding position paper award at @icmlconf :) Come attend my talk and poster tomorrow on human centered considerations for a safer and better future of work I will be recruiting PhD students at @stonybrooku @sbucompsc coming fall. Please get in touch.

SSanchaita Hazra@hsanchaita · May 7

Very excited for a new #ICML2025 position paper accepted as oral w @mbodhisattwa & @TuhinChakr! 😎 What are the longitudinal harms of AI development? We use economic theories to highlight AI’s intertemporal impacts on livelihoods & its role in deepening labor-market inequality.

6.0K

Peter Jansen ( @peterjansen-ai.bsky.social ) Retweeted

Derek Thompson@DKThomp · Jul 13

Two weeks ago, Marco Rubio said USAID “has little to show since the end of the Cold War.” Days earlier, a Lancet study estimated that USAID global health programs have saved 90 million lives—not since 1991, but since just 2001.

858

3.0K

8.0K

628

314.0K

Peter Jansen ( @peterjansen-ai.bsky.social ) Retweeted

Keyon Vafa@keyonV · Jul 11

Can an AI model predict perfectly and still have a terrible world model? What would that even mean? Our new ICML paper formalizes these questions One result tells the story: A transformer trained on 10M solar systems nails planetary orbits. But it botches gravitational laws 🧵

213

1.0K

7.0K

5.0K

1.3M

Peter Jansen ( @peterjansen-ai.bsky.social ) Retweeted

Adrian Dittmann@AdrianDittmann · Jul 12

136

296

2.0K

269

131.0K

Peter Jansen ( @peterjansen-ai.bsky.social ) Retweeted

James Zou@james_y_zou · Jul 11

📢New conference where AI is the primary author and reviewer! agents4science.stanford.edu Current venues don't allow AI-written papers, so it's hard to assess the +/- of such works🤔 #Agents4Science solicits papers where AI is the main author w/ human advisors. 💡Initial reviews by…

122

483

205

105.0K

Peter Jansen ( @peterjansen-ai.bsky.social ) Retweeted

Kexin Huang@KexinHuang5 · Jul 4

🤝Excited to announce @ProjectBiomni × @AnthropicAI! AI agents are set to transform how biologists do everyday research. Thanks to this partnership, the platform is now free for scientists worldwide: biomni.stanford.edu Learn more: anthropic.com/customers/biom…

422

218

40.0K

Peter Jansen ( @peterjansen-ai.bsky.social ) Retweeted

Thomas Wolf@Thom_Wolf · Jul 2

We are so excited to announce a new open-source challenge in collaboration with @proximafusion : unlocking fusion with AI If you haven't followed, fusion is how the sun make energy and is –in the long term– our best bet on a clean, safe, and virtually limitless energy In the…

110

36.0K

Peter Jansen ( @peterjansen-ai.bsky.social ) Retweeted

James Zou@james_y_zou · Jul 1

Introducing Fractional Reasoning: a mechanistic method to quantitatively control how much thinking a LLM performs. tldr: we identify latent reasoning knobs in transformer embedding ➡️ better inference compute approach that mitigates under/over-thinking arxiv.org/pdf/2506.15882

166

112

15.0K

Peter Jansen ( @peterjansen-ai.bsky.social ) Retweeted

CLS@ChengleiSi · Jun 30

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

170

599

204

139.0K

Peter Jansen ( @peterjansen-ai.bsky.social ) Retweeted

Ai2@allen_ai · Jul 1

Introducing SciArena, a platform for benchmarking models across scientific literature tasks. Inspired by Chatbot Arena, SciArena applies a crowdsourced LLM evaluation approach to the scientific domain. 🧵

409

210

69.0K

Peter Jansen ( @peterjansen-ai.bsky.social ) Retweeted

Anthropic@AnthropicAI · Jun 27

Anthropic staff realized they could ask Claude to buy things that weren’t just food & drink. After someone randomly decided to ask it to order a tungsten cube, Claude ended up with an inventory full of (as it put it) “specialty metal items” that it ended up selling at a loss.

211

4.0K

334

839.0K

Peter Jansen ( @peterjansen-ai.bsky.social ) Retweeted

Ai2@allen_ai · Jun 27

Today we’re releasing a prototype of Genesys, an autonomous multi-agent LLM discovery system that aims to discover new types of language model architectures. We found Genesys can discover novel architectures competitive with the industry-standard transformer. 🧵

246

135

21.0K

Peter Jansen ( @peterjansen-ai.bsky.social ) Retweeted

Science of Science @ AOM@MishaTeplitskiy · Jun 27

Verrrrry intriguing-looking and labor-intensive test of whether LLMs can come up with good scientific ideas. After implementing those ideas, the verdict seems to be "no, not really."

106

680

530

109.0K

Peter Jansen ( @peterjansen-ai.bsky.social ) Retweeted

PHD Comics@PHDcomics · Jun 26

Meeting with your PI

282

20.0K

Peter Jansen ( @peterjansen-ai.bsky.social ) Retweeted

Lucas Caccia@LucasPCaccia · Jun 25

RAG and in-context learning are the go-to approaches for integrating new knowledge into LLMs, making inference very inefficient We propose instead 𝗞𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 𝗠𝗼𝗱𝘂𝗹𝗲𝘀 : lightweight LoRA modules trained offline that can match RAG performance without the drawbacks

4.0K