Jacob Austin
@jacobaustin132
Research at @GoogleDeepMind. Currently making LLMs go fast. I also play piano and climb. NYC. Opinions my own
Making LLMs run efficiently can feel scary, but scaling isn’t magic, it’s math! We wanted to demystify the “systems view” of LLMs and wrote a little textbook called “How To Scale Your Model” which we’re releasing today. 1/n

it’s so over
Some news: We're building the next big thing — the first-ever AI-only social video app, built on a highly expressive human video model. Over the past few weeks, we’ve been testing it in private beta. Now, we’re opening early access: download the iOS app to join the waitlist, or…
Advanced version of Gemini Deep Think (announced at #GoogleIO) using parallel inference time computation achieved gold-medal performance at IMO, solving 5/6 problems with rigorous proofs as verified by official IMO judges! Congrats to all involved! deepmind.google/discover/blog/…
These talks are awesome and all on YouTube!
Wrapped up Stanford CS336 (Language Models from Scratch), taught with an amazing team @tatsu_hashimoto @marcelroed @neilbband @rckpudi. Researchers are becoming detached from the technical details of how LMs work. In CS336, we try to fix that by having students build everything:
An interesting phenomenon: because LLM API providers are generally not able to log or view customer traffic, jailbreaks can in theory exist undetected unless the API customer has sufficient monitoring in place
Breaking News: A Columbia student activist, a legal permanent resident, was arrested by ICE at a meeting he thought was a step to becoming a U.S. citizen. nyti.ms/4j9w7JR
I'm glad to see at least one university has the slightest semblance of principle left.
I’m particularly proud right now to be a graduate of Harvard College. Thank you, President Garber. harvard.edu/president/news…
To interpret AI benchmarks, we need to look at the data. Top-level numbers don't mean what you think: there may be broken tasks, unexpected behaviors, or near-misses. We're introducing Docent to accelerate analysis of AI agent transcripts. It can spot surprises in seconds. 🧵👇
BREAKING: Gemini 2.5 Pro is now #1 on the Arena leaderboard - the largest score jump ever (+40 pts vs Grok-3/GPT-4.5)! 🏆 Tested under codename "nebula"🌌, Gemini 2.5 Pro ranked #1🥇 across ALL categories and UNIQUELY #1 in Math, Creative Writing, Instruction Following, Longer…
Think you know Gemini? 🤔 Think again. Meet Gemini 2.5: our most intelligent model 💡 The first release is Pro Experimental, which is state-of-the-art across many benchmarks - meaning it can handle complex problems and give more accurate responses. Try it now →…
Anyone done the Whitney mountaineers route or the east buttress? Planning to do one or the other this in a few weeks but wanted to talk to someone who's done it
It's bizarre when relatively techno-utopian people are asked about how to solve declining fertility and instead of talking about artificial wombs, extended fertility spans, AI-assisted childcare, UBI, etc. they're suddenly like "well we just need to return to the 50s".
Deepseek R1 inference in pure JAX! Currently on TPU, with GPU and distilled models in-progress. Features MLA-style attention, expert/tensor parallelism & int8 quantization. Contributions welcome!
"CONGESTION PRICING IS DEAD. Manhattan, and all of New York, is SAVED. LONG LIVE THE KING!" –President Donald J. Trump
Some awesome stuff here about LLM scaling (esp. on GPUs). Their LLAMA sharding/memory diagram is great. Glad to see it becoming easier to understand scaling in the open
The Ultra-Scale Playbook: Training LLMs on GPU Clusters Learn how to train your own DeepSeek-V3 model using 5D parallelism, ZeRO, fast kernels, compute/comm overlap and bottlenecks with theory, interactive plots and 4000+ scaling experiments and audio! huggingface.co/spaces/nanotro…