Ai2

@allen_ai

Breakthrough AI to solve the world's biggest problems. › Join us: https://allenai.org/careers › Newsletter: https://tinyurl.com/3vc2r2m8

Seattle, WA

Joined September 2015

404Following

71KFollowers

Pinned

Ai2@allen_ai · Jul 9

Introducing FlexOlmo, a new paradigm for language model training that enables the co-development of AI through data collaboration. 🧵

412

185

315.0K

Ai2@allen_ai · 19 h

issues w preference LM benchmarks 🐡data contains cases where the "bad" response is just as good as chosen one 🐟model rankings can feel off (claude ranks lower than expected) led by @cmalaviya11 (TACL 2025), we study underspecified queries & detrimental effect on model evals

AAi2@allen_ai · 21 h

In our new paper, “Contextualized Evaluations: Judging Language Model Responses to Underspecified Queries,” we find that adding just a bit of missing context can reorder model leaderboards—and surface hidden biases. 🧵👇

4.0K

Ai2@allen_ai · Jul 18

Excited to share what I have been focusing on this year! Inference-time search to optimize Bayesian surprise pushes us towards long-horizon discovery! Introducing "AutoDS": Autonomous Discovery via Surprisal. "It can not only find the diamond in the rough, but also can rule out…

AAi2@allen_ai · Jul 18

Great science starts with great questions. 🤔✨ Meet AutoDS—an AI that doesn’t just hunt for answers, it decides which questions are worth asking. 🧵

171

110

16.0K

Ai2@allen_ai · Jul 17

A new model enters SciArena. 👀 Welcome Moonshot AI's Kimi K2! SciArena lets you benchmark models across scientific literature tasks, applying a crowdsourced LLM evaluation approach to the scientific domain. 🧪 Learn more and try SciArena here: sciarena.allen.ai

allen_ai's tweet image. A new model enters SciArena. 👀 Welcome Moonshot AI's Kimi K2! SciArena lets you benchmark models across scientific literature tasks, applying a crowdsourced LLM evaluation approach to the scientific domain.

🧪 Learn more and try SciArena here:
sciarena.allen.ai

4.0K

Ai2@allen_ai · Jul 16

You can now jump from Scholar QA answers to highlighted evidence in the source paper's pdf : )

AAi2@allen_ai · Jul 16

We’ve upgraded ScholarQA, our agent that helps researchers conduct literature reviews efficiently by providing detailed answers. Now, when ScholarQA cites a source, it won’t just tell you which paper it came from–you’ll see the exact quote, highlighted in the original PDF. 🧵

3.0K