Jason Lee
@jasondeanlee
Associate Professor at UC Berkeley. Former Research Scientist at Google DeepMind. ML/AI Researcher working on foundations of LLMs and deep learning.
Our new work on scaling laws that includes compute, model size, and number of samples. The analysis involves an extremely fine-grained analysis of online sgd built up over the last 8 years of understanding sgd on simple toy models (tensors, single index models, multi index model)
Excited to announce a new paper with Yunwei Ren, Denny Wu, @jasondeanlee! We prove a neural scaling law in the SGD learning of extensive width two-layer neural networks. arxiv.org/abs/2504.19983 🧵below (1/10)
@aldopacchiano @AlexGDimakis @YejinChoinka @prfsanjeevarora @jasondeanlee @yayitsamyzhang @lateinteraction @natashajaques @LukeZettlemoyer @Aaroth @LesterMackey @ysu_nlp @tydsh @pinyuchenTW @pulkitology .... I can go on forever...
I predict though that within next year many other teams will achieve this milestone and without using as much compute. Hoping Goedel prover v3 from @PrincetonPLI will too.
Another AI system, ByteDance's SeedProver solved 4 out of 6 IMO problems *with* Lean, and solved a fifth with extended compute. This is becoming routine, like when we went to the moon for the fourth time. There is *nothing* "routine" about this!!...
I bet pretty soon a Chinese research org drops a LLM scaling laws for RL paper. Closed frontier labs have definitely done this and wont share it, academics havent mastered the data + infra tweaks yet.
Watching in hope that it reveals how to make 100m
I have a hot take that most people still underestimate how impactful AI will be. Last month I gave two talks at Columbia and Harvard on the state of AI and how I slowly got AGI‑pilled over the last decade (yes, I was very skeptical about AGI 10 years ago). Many friends who…
Our Goedel-prover-V2 is featured on the front page of the Princeton AI lab news! (Photo with @Yong18850571 and @sangertang1999 😁) ai.princeton.edu/news/2025/prin…
I have a hot take that most people still underestimate how impactful AI will be. Last month I gave two talks at Columbia and Harvard on the state of AI and how I slowly got AGI‑pilled over the last decade (yes, I was very skeptical about AGI 10 years ago). Many friends who…
The positive results proved in this paper are fascinating. It incorporates many new concepts that practitioners use, such as: 1. Can models learn to expand on their reasoning and develop different reasoning paths? 2. Can they backtrack?
🧵 In our earlier threads, we explored what it means to learn to reason, why it’s hard, and why many current approaches fall short. Now let’s dive into one of the core proofs from our paper. Appendix A of our paper shows that an auto-regressive transformer cannot learn to…
Autoregressive Transformers are computational devices that use T space and T^2 time to produce T tokens (so space is sublinear in time). On the basis of relative relationship between space and time, they match Ryan Williams' breakthrough result that every Turing machine running…
Honored to receive the NSF CAREER Award from the Robust Intelligence (RI) program and the CORE grant from the Information Integration and Informatics (III) program! Deep gratitude to my home institute @siebelschool, students, colleagues, and mentors for their unwavering support!
Super excited about our latest open model! We have been carefully designing new post-training data and algorithmic pipelines to ensure generalization into unseen domains, and more results will be released soon!
Very excited to announce Llama-Nemotron-Super-V1.5! Super-V1.5 is now better than Ultra-V1. This is currently the best model that can be deployed on a single H100. Reasoning On/Off and drop in replacement for V1. Open-weight, code and data on HF huggingface.co/nvidia/Llama-3…
Is there a tech report for this?
🚀 We’re excited to introduce Qwen3-235B-A22B-Thinking-2507 — our most advanced reasoning model yet! Over the past 3 months, we’ve significantly scaled and enhanced the thinking capability of Qwen3, achieving: ✅ Improved performance in logical reasoning, math, science & coding…
What are the best LLM pre-training papers? That give the most insight into the process. Current/recent, and older papers that stand the test of time.
Interested in some foundation aspects? Waiting or unhappy about NeurIPS reviews? Plz consider NeurIPS workshop DynaFront: Dynamics at the Frontiers of Optimization, Sampling, and Games sites.google.com/view/dynafront… @yuejiec @Andrea__M @btreetaiji @T_Chavdarova ++ Sponsor appreciated!
Is LLM use finally making me less capable? I started using LLMs three years ago for text and code gen. Now, I use several of them, for a ton more things. In fact, I feel like I use them for a huge fraction of the cognitive tasks that I perform that can be described in text.…
Code release! 🚀 Following up on our IMO 2025 results with the public LLM Gemini 2.5 Pro — here’s the full pipeline & general (non-problem-specific) prompts. 👉 [github.com/lyang36/IMO25] Have fun exploring! #AI #Math #LLMs #IMO2025
🚨 Olympiad math + AI: We ran Google’s Gemini 2.5 Pro on the fresh IMO 2025 problems. With careful prompting and pipeline design, it solved 5 out of 6 — remarkable for tasks demanding deep insight and creativity. The model could win gold! 🥇 #AI #Math #LLMs #IMO2025
I started reading the paper more carefully. Especially the proofs in Appendix A. It seems that your goal is to learn a function g which gives you the correct digit of the multiplication for the tens, which is a simpler version of the whole multiplication but the underlying…
Prompting Llama 3.1 70B with the “Mr and Mrs. D” can generate seed the generation of a near-exact copy of the entire ~300 page book ‘Harry Potter & the Sorcerer’s Stone’ 🤯 We define a “near-copy” as text that is identical modulo minor spelling / punctuation variations. Below…
When I mention “Jensen’s inequality” in my MBA course (we teach the flaws of the averages) people just assume it was invented by Jensen Huang lol