Percy Liang

@percyliang

Associate Professor in computer science @Stanford @StanfordHAI @StanfordCRFM @StanfordAILab @stanfordnlp | cofounder @togethercompute | Pianist

Stanford, CA

Joined October 2009

417Following

80KFollowers

Pinned

Percy Liang@percyliang · May 19

What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:

percyliang's tweet image. What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:

203

1.0K

424

146.0K

Pinned

Percy Liang@percyliang · Jun 20

🚀 We just launched RoboArena — a real-world evaluation platform for robot policies! Think Chatbot Arena, but for robotics. 📝 Paper: robo-arena.github.io/assets/roboare… 🌐 Website: robo-arena.github.io Joint work with @pranav_atreya and @KarlPertsch. advised by @percyliang,…

KKarl Pertsch@KarlPertsch · Jun 20

We’re releasing the RoboArena today!🤖🦾 Fair & scalable evaluation is a major bottleneck for research on generalist policies. We’re hoping that RoboArena can help! We provide data, model code & sim evals for debugging! Submit your policies today and join the leaderboard! :) 🧵

15.0K

Percy Liang@percyliang · 2 h

Llama 3.1 must love Harry Potter!

AAhmed Ahmed@AhmedSQRD · 7 h

Prompting Llama 3.1 70B with the “Mr and Mrs. D” can generate seed the generation of a near-exact copy of the entire ~300 page book ‘Harry Potter & the Sorcerer’s Stone’ 🤯 We define a “near-copy” as text that is identical modulo minor spelling / punctuation variations. Below…

3.0K

Percy Liang Retweeted

Together AI@togethercompute · 21 h

🧠 Qwen3 just leveled up on Together AI 🚀 Qwen3-235B-A22B-Instruct-2507-FP8 isn't just another model update - it's a leap forward 📈

11.0K

Percy Liang Retweeted

Together AI@togethercompute · Jul 17

Together AI Sets a New Bar: Fastest Inference for DeepSeek-R1-0528 We’ve upgraded the Together Inference Engine to run on @NVIDIA Blackwell GPUs—and the results speak for themselves: 📈 Highest known serverless throughput: 334 tokens/sec 🏃‍Fastest time to first answer token:…

102

36.0K

Percy Liang Retweeted

Together AI@togethercompute · Jul 17

Most AI benchmarks test the past. But real intelligence is about predicting the future. Introducing FutureBench — a new benchmark for evaluating agents on real forecasting tasks that we developed with @huggingface 🔍 Reasoning > memorization 📊 Real-world events 🧠 Dynamic,…

25.0K

Percy Liang Retweeted

Sang Truong@sangttruong · Jul 16

The adaptive testing is integrated into HELM: crfm-helm.readthedocs.io/en/latest/reev… HELM integration blog: crfm.stanford.edu/2025/06/04/rel… You can run jobs with a single command with HELM! We thank Yifan Mai, @percyliang, and @StanfordCRFM for their help! 🧵6/9

3.0K

Percy Liang@percyliang · Jul 16

The time I invested in learning Jax has been paid back ten-fold both in TPU FLOPs and great infra from @dlwh. Would recommend (esp. if you join us @ marin.community)

GGoogle AI Developers@googleaidevs · Jul 16

.@StanfordCRFM's Marin project has released the first fully open model in JAX. It’s an 'open lab' sharing the entire research process - including code, data, and logs, to enable reproducibility and further innovation. developers.googleblog.com/en/stanfords-m…

12.0K

Percy Liang Retweeted

Chenchen Gu@chenchenygu · Jul 14

Prompt caching lowers inference costs but can leak private information from timing differences. Our audits found 7 API providers with potential leakage of user data. Caching can even leak architecture info—OpenAI's embedding model is likely a decoder-only Transformer! 🧵1/9

140

18.0K

Percy Liang@percyliang · Jul 12

heading to @icmlconf #ICML2025 next week! come say hi & i'd love to learn about your work :) i'll present this paper (arxiv.org/abs/2503.17514) on the pitfalls of training set inclusion in LLMs, Thursday 11am here are my talk slides to flip through: ai.stanford.edu/~kzliu/files/m…

KKen Liu@kenziyuliu · Apr 3

An LLM generates an article verbatim—did it “train on” the article? It’s complicated: under n-gram definitions of train-set inclusion, LLMs can complete “unseen” texts—both after data deletion and adding “gibberish” data. Our results impact unlearning, MIAs & data transparency🧵

311

179

38.0K

Percy Liang Retweeted

Daniel Kang@daniel_d_kang · Jul 8

As AI agents near real-world use, how do we know what they can actually do? Reliable benchmarks are critical but agentic benchmarks are broken! Example: WebArena marks "45+8 minutes" on a duration calculation task as correct (real answer: "63 minutes"). Other benchmarks…

20.0K

Percy Liang Retweeted

elie@eliebakouch · Jul 8

Super excited to share SmolLM3, a new strong 3B model. SmolLM3 is fully open, we share the recipe, the dataset, the training codebase and much more! > Train on 11T token on 384 H100 for 220k GPU hours > Support long context up to 128k thanks to NoPE and intra document masking >…

137

834

474

114.0K

Percy Liang Retweeted

Nathan Lambert@natolambert · Jul 4

My latest post: The American DeepSeek Project Build fully open models in the US in the next two years to enable a flourishing, global scientific AI ecosystem to balance China's surge in open-source and an alternative to building products ontop of leading closed models.

637

144

138.0K

Percy Liang Retweeted

Christopher Manning@chrmanning · Jul 2

I’ve joined @aixventureshq as a General Partner, working on investing in deep AI startups. Looking forward to working with founders on solving hard problems in AI and seeing products come out of that! Thank you @ychernova at @WSJ for covering the news: wsj.com/articles/ai-re…

514

49.0K

Percy Liang@percyliang · Jul 1

While doing WSD cooldowns for the marin.community project, this gradient increase led to problematic loss ascent. We patched it with Z-loss, but AdamC feels better™️. So over the weekend, I ran 4 experiments—130M to 1.4B params—all at ~compute-optimal token counts...🧵

AAaron Defazio@aaron_defazio · Jun 5

Why do gradients increase near the end of training? Read the paper to find out! We also propose a simple fix to AdamW that keeps gradient norms better behaved throughout training. arxiv.org/abs/2506.02285

104

102

24.0K

Percy Liang@percyliang · Jun 26

Open development of language models in action!

DDavid Hall@dlwh · Jun 26

So about a month ago, Percy posted a version of this plot of our Marin 32B pretraining run. We got a lot of feedback, both public and private, that the spikes were bad. (This is a thread about how we fixed the spikes. Bear with me. )

109

15.0K

Percy Liang Retweeted

Dawn Song@dawnsongtweets · Jun 18

1/ 🔥 AI agents are reaching a breakthrough moment in cybersecurity. In our latest work: 🔓 CyberGym: AI agents discovered 15 zero-days in major open-source projects 💰 BountyBench: AI agents solved real-world bug bounty tasks worth tens of thousands of dollars 🤖…

141

484

333

101.0K