Jason Lee

@jasondeanlee

Associate Professor at UC Berkeley. Former Research Scientist at Google DeepMind. ML/AI Researcher working on foundations of LLMs and deep learning.

Berkeley, CA

Joined October 2018

4KFollowing

16KFollowers

Pinned

Jason Lee@jasondeanlee · May 5

Our new work on scaling laws that includes compute, model size, and number of samples. The analysis involves an extremely fine-grained analysis of online sgd built up over the last 8 years of understanding sgd on simple toy models (tensors, single index models, multi index model)

EEshaan Nichani@EshaanNichani · May 5

Excited to announce a new paper with Yunwei Ren, Denny Wu, @jasondeanlee! We prove a neural scaling law in the SGD learning of extensive width two-layer neural networks. arxiv.org/abs/2504.19983 🧵below (1/10)

152

59.0K

Pinned

Jason Lee Retweeted

Souradip Chakraborty@SOURADIPCHAKR18 · Jul 26

@aldopacchiano @AlexGDimakis @YejinChoinka @prfsanjeevarora @jasondeanlee @yayitsamyzhang @lateinteraction @natashajaques @LukeZettlemoyer @Aaroth @LesterMackey @ysu_nlp @tydsh @pinyuchenTW @pulkitology .... I can go on forever...

2.0K

Pinned

Jason Lee@jasondeanlee · Jul 25

He's terrible. Screwed my buddy sgd.

YYiping Lu@2prime_PKU · Jul 25

Anyone knows adam?

138

11.0K

Pinned

Jason Lee@jasondeanlee · Jul 24

I predict though that within next year many other teams will achieve this milestone and without using as much compute. Hoping Goedel prover v3 from @PrincetonPLI will too.

AAlex Kontorovich@AlexKontorovich · Jul 23

Another AI system, ByteDance's SeedProver solved 4 out of 6 IMO problems *with* Lean, and solved a fifth with extended compute. This is becoming routine, like when we went to the moon for the fourth time. There is *nothing* "routine" about this!!...

5.0K

Jason Lee Retweeted

Nathan Lambert@natolambert · 22 h

I bet pretty soon a Chinese research org drops a LLM scaling laws for RL paper. Closed frontier labs have definitely done this and wont share it, academics havent mastered the data + infra tweaks yet.

687

157

55.0K

Jason Lee@jasondeanlee · 9 h

Watching in hope that it reveals how to make 100m

SShuchao Bi@shuchaobi · 20 h

I have a hot take that most people still underestimate how impactful AI will be. Last month I gave two talks at Columbia and Harvard on the state of AI and how I slowly got AGI‑pilled over the last decade (yes, I was very skeptical about AGI 10 years ago). Many friends who…

2.0K

Jason Lee Retweeted

Chi Jin@chijinML · 11 h

Our Goedel-prover-V2 is featured on the front page of the Princeton AI lab news! (Photo with @Yong18850571 and @sangertang1999 😁) ai.princeton.edu/news/2025/prin…

112

6.0K

Jason Lee Retweeted

Shuchao Bi@shuchaobi · 20 h

405

497

48.0K

Jason Lee@jasondeanlee · Jul 27

The positive results proved in this paper are fascinating. It incorporates many new concepts that practitioners use, such as: 1. Can models learn to expand on their reasoning and develop different reasoning paths? 2. Can they backtrack?

SShai Shalev-Shwartz@shai_s_shwartz · Jul 23

🧵 In our earlier threads, we explored what it means to learn to reason, why it’s hard, and why many current approaches fall short. Now let’s dive into one of the core proofs from our paper. Appendix A of our paper shows that an auto-regressive transformer cannot learn to…

2.0K

Jason Lee Retweeted

Dimitris Papailiopoulos@DimitrisPapail · 23 h

Autoregressive Transformers are computational devices that use T space and T^2 time to produce T tokens (so space is sublinear in time). On the basis of relative relationship between space and time, they match Ryan Williams' breakthrough result that every Turing machine running…

223

110

18.0K

Jason Lee Retweeted

Han Zhao@hanzhao_ml · Jul 27

Honored to receive the NSF CAREER Award from the Robust Intelligence (RI) program and the CORE grant from the Information Integration and Informatics (III) program! Deep gratitude to my home institute @siebelschool, students, colleagues, and mentors for their unwavering support!

178

10.0K

Jason Lee@jasondeanlee · Jul 26

Super excited about our latest open model! We have been carefully designing new post-training data and algorithmic pipelines to ensure generalization into unseen domains, and more results will be released soon!

OOleksii Kuchaiev@kuchaev · Jul 25

Very excited to announce Llama-Nemotron-Super-V1.5! Super-V1.5 is now better than Ultra-V1. This is currently the best model that can be deployed on a single H100. Reasoning On/Off and drop in replacement for V1. Open-weight, code and data on HF huggingface.co/nvidia/Llama-3…

6.0K

Jason Lee@jasondeanlee · Jul 26

Is there a tech report for this?

QQwen@Alibaba_Qwen · Jul 25

🚀 We’re excited to introduce Qwen3-235B-A22B-Thinking-2507 — our most advanced reasoning model yet! Over the past 3 months, we’ve significantly scaled and enhanced the thinking capability of Qwen3, achieving: ✅ Improved performance in logical reasoning, math, science & coding…

4.0K

Jason Lee Retweeted

Yoav Artzi@yoavartzi · Jul 25

What are the best LLM pre-training papers? That give the most insight into the process. Current/recent, and older papers that stand the test of time.

121

180

17.0K

Jason Lee Retweeted

Molei Tao@MoleiTaoMath · Jul 24

Interested in some foundation aspects? Waiting or unhappy about NeurIPS reviews? Plz consider NeurIPS workshop DynaFront: Dynamics at the Frontiers of Optimization, Sampling, and Games sites.google.com/view/dynafront… @yuejiec @Andrea__M @btreetaiji @T_Chavdarova ++ Sponsor appreciated!

101

15.0K

Jason Lee Retweeted

Dimitris Papailiopoulos@DimitrisPapail · Jul 24

Is LLM use finally making me less capable? I started using LLMs three years ago for text and code gen. Now, I use several of them, for a ton more things. In fact, I feel like I use them for a huge fraction of the cognitive tasks that I perform that can be described in text.…

375

117

48.0K

Jason Lee@jasondeanlee · Jul 24

Code release! 🚀 Following up on our IMO 2025 results with the public LLM Gemini 2.5 Pro — here’s the full pipeline & general (non-problem-specific) prompts. 👉 [github.com/lyang36/IMO25] Have fun exploring! #AI #Math #LLMs #IMO2025

LLin Yang@lyang36 · Jul 22

🚨 Olympiad math + AI: We ran Google’s Gemini 2.5 Pro on the fresh IMO 2025 problems. With careful prompting and pipeline design, it solved 5 out of 6 — remarkable for tasks demanding deep insight and creativity. The model could win gold! 🥇 #AI #Math #LLMs #IMO2025

286

167

41.0K

Jason Lee Retweeted

Kimon Fountoulakis@kfountou · Jul 24

I started reading the paper more carefully. Especially the proofs in Appendix A. It seems that your goal is to learn a function g which gives you the correct digit of the multiplication for the tens, which is a simpler version of the whole multiplication but the underlying…

1.0K

Jason Lee Retweeted

Mathieu@miniapeur · Jul 23

Please just give me one more bro

230

11.0K

Jason Lee Retweeted

Ahmed Ahmed@AhmedSQRD · Jul 23

Prompting Llama 3.1 70B with the “Mr and Mrs. D” can generate seed the generation of a near-exact copy of the entire ~300 page book ‘Harry Potter & the Sorcerer’s Stone’ 🤯 We define a “near-copy” as text that is identical modulo minor spelling / punctuation variations. Below…

21.0K

Jason Lee@jasondeanlee · Jul 24

When I mention “Jensen’s inequality” in my MBA course (we teach the flaws of the averages) people just assume it was invented by Jensen Huang lol

T@ ·

1.0K