Ludwig Schmidt

@lschmidt3

Assistant professor at @Stanford and member of the technical staff at @AnthropicAI.

Palo Alto, CA

Joined August 2009

426Following

6KFollowers

Pinned

I'm a big fan of the approach to research funding @andykonwinski and the Laude team are taking! Working with them on terminal-bench has been fantastic (thanks @alexgshaw!) and I'm excited that they're going to support more open, impact-oriented research.

AAndy Konwinski@andykonwinski · Jun 23

Today, I’m launching a deeply personal project. I’m betting $100M that we can help computer scientists create more upside impact for humanity. Built for and by researchers, including @JeffDean & @jpineau1 on the board, @LaudeInstitute catalyzes research with real-world impact.

9.0K

Ludwig Schmidt Retweeted

Alex Shaw@alexgshaw · Jul 16

Evaluating agents on benchmarks is a pain. Each benchmark comes with its own harness, scoring scripts, and environments and integrating can take days. We're introducing the Terminal-Bench dataset registry to solve this problem. Think of it as the npm of agent benchmarks. Now…

9.0K

Ludwig Schmidt Retweeted

Thao Nguyen@thao_nguyen26 · Jun 23

Web data, the “fossil fuel of AI”, is being exhausted. What’s next?🤔 We propose Recycling the Web to break the data wall of pretraining via grounded synthetic data. It is more effective than standard data filtering methods, even with multi-epoch repeats! arxiv.org/abs/2506.04689

220

125

32.0K

Ludwig Schmidt Retweeted

Ryan Marten@ryanmart3n · Jun 5

Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals. We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data…

191

922

726

189.0K

Ludwig Schmidt@lschmidt3 · May 30

Cool to see more work on data for AI agents!

AAlex Ratner@ajratner · May 29

Agentic AI will transform every enterprise–but only if agents are trusted experts. The key: Evaluation & tuning on specialized, expert data. I’m excited to announce two new products to support this–@SnorkelAI Evaluate & Expert Data-as-a-Service–along w/ our $100M Series D! ---…

3.0K

Ludwig Schmidt Retweeted

Percy Liang@percyliang · May 19

What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:

203

1.0K

424

147.0K

Ludwig Schmidt@lschmidt3 · May 20

Very excited about our new agent benchmark! I think it's a nice way of evaluating how well agents can do complex task in terminal (command line) environments.

MMike A. Merrill@Mike_A_Merrill · May 19

Many agents (Claude Code, Codex CLI) interact with the terminal to do valuable tasks, but do they currently work well enough to deploy en masse? We’re excited to introduce Terminal-Bench: An evaluation environment and benchmark for AI agents on real-world terminal tasks. Tl;dr…

12.0K

Ludwig Schmidt Retweeted

Mike A. Merrill@Mike_A_Merrill · May 19

234

101

47.0K

Ludwig Schmidt Retweeted

John Yang@jyangballin · May 7

40% with just 1 try per task: SWE-agent-LM-32B is the new #1 open source model on SWE-bench Verified. We built it by synthesizing a ton of agentic training data from 100+ Python repos. Today we’re open-sourcing the toolkit that made it happen: SWE-smith.

133

653

379

97.0K

Ludwig Schmidt Retweeted

Thao Nguyen@thao_nguyen26 · May 1

📢 Announcing our data-centric workshop at ICML 2025 on unifying data curation frameworks across domains! 📅 Deadline: May 24, AoE 🔗 Website: dataworldicml2025.github.io We have an amazing lineup of speakers + panelists from various institutions and application areas.

135

25.0K

Ludwig Schmidt Retweeted

Stanford AI Lab@StanfordAILab · Apr 29

SAIL is still accepting applications for the SAIL Postdoctoral Fellowships! This is an opportunity to work with our wonderful professors and community. Applications received by the end of April 30 will receive full consideration: ai.stanford.edu/postdoctoralfe…

9.0K

Ludwig Schmidt Retweeted

Stanford NLP Group@stanfordnlp · Apr 19

Want to learn the engineering details of building state-of-the-art Large Language Models (LLMs)? Not finding much info in @OpenAI’s non-technical reports? @percyliang and @tatsu_hashimoto are here to help with CS336: Language Modeling from Scratch, now rolling out to YouTube.

162

1.0K

196.0K

Ludwig Schmidt Retweeted

Stanford AI Lab@StanfordAILab · Apr 18

Stanford AI Lab (SAIL) is excited to announce new SAIL Postdoctoral Fellowships! We are looking for outstanding candidates excited to advance the frontiers of AI with our professors and vibrant community. Applications received by the end of April 30 will receive full…

203

59.0K

Ludwig Schmidt Retweeted

Etash Guha@etash_guha · Apr 3

Turns out, it’s possible to outperform DeepSeekR1-32B with only SFT on open data and no RL: Announcing OpenThinker2-32B and OpenThinker2-7B. We also release the data, OpenThoughts2-1M, curated by selecting quality instructions from diverse sources. 🧵 (1/n)

137

464

333

86.0K

Ludwig Schmidt@lschmidt3 · Feb 12

Very nice community progress on open-data reasoning models since the R1 release!

NNegin Raoof@NeginRaoof_ · Feb 12

Announcing OpenThinker-32B: the best open-data reasoning model distilled from DeepSeek-R1. Our results show that large, carefully curated datasets with verified R1 annotations produce SoTA reasoning models. Our 32B model outperforms all 32B models including…

4.0K