Andriy Burkov
@burkov
My new LM book: http://thelmbook.com PhD in AI, author of 📖 The Hundred-Page Language Models Book and 📖 The Hundred-Page Machine Learning Book
As you know, I've resigned from my full-time job and become a professional technical book writer. I have plans for The Hundred-Page Books about reinforcement learning, computer vision, diffusion models, and more. Not having a full-time job has put a significant strain on my…

If you never question what the LLM writes as code, you will end up with an oversized codebase that implements all the tricks the LLM saw online, whether you need them or not.

We have a very poor understanding of why deep neural networks like transformer models learn the parameters they learn. For example, in the paper below from 2013, the authors demonstrated that 5% of the weights of a trained deep neural network can be used to predict the values of…

I asked Claude to fix a bug and provided only a part of the application code because the full code wouldn't fit into the context. It printed updated code files, including one I hadn't shown it. So it hallucinated this file entirely, and it had exactly the same code as the…
The updated reasoning Qwen3-235B-A22B-Thinking-2507, with only 235B parameters and active 22B parameters (it's an MoE), is competitive with the frontier LLMs and is Apache 2.0. The Chinese don't let greedy capitalists abuse customers. Communism 2.0.


The Overton window in action. Just a couple of years before ChatGPT was released, Google had to ban gorillas in its image search theguardian.com/technology/201…, while Microsoft had to shut down their chatbot because it started to insult users with racist remarks bbc.com/news/technolog….…
sounds like everything is going great.
True Positive Weekly #119, by @burkov open.substack.com/pub/aiweekly/p…
On the time savings with using LLM for coding: On one hand, you can code in 5 minutes what would take a day by hand. On the other hand, you can spend 2 days fixing a bug that would take only 5 minutes by hand.
Why, in more than 10 years, could only one person make universities abolish racist admission policies? Who will compensate for the ruined lives and future careers of thousands of affected prospective students?

Any news on diffusion language models? The last thing I heard was from Google on Gemini Diffusion in May. The limitation I'm curious about is how they will solve the fixed output size any diffusion model is supposed to have, so printing long chunks of code should be impossible.…
In one photo, you can see bread. In the other photo, you can see shit. I've been living in North America for 20 years and still cannot understand why grocery stores sell shit.


The Dark Hundred-Page Language Models Book Link - amzn.to/4mafvD0 #MachineLearning #ML #AI #LLMs #MLOps #DeepLearning #DL #DataScience #DataAnalytics #code #Coding #Python #PyTorch #programming
The Hundred-Page Machine Learning Book (The Hundred-Page Books) Link - amzn.to/3GXTiZU #MachineLearning #ML #AI #ArtificialIntelligence #code #Coding #Python #DataScience #SoftwareEngineering #programming #DeepLearning #LLMs #programming
Tesla should become a car insurance company!
NEWS: Tesla has revealed that in Q2 2025, they recorded one crash for every 6.69 million miles driven in which drivers were using Autopilot technology. For drivers who were not using Autopilot technology, @Tesla recorded one crash for every 963,000 miles driven. By comparison,…
Watering down the tokens generated by an expensive model with tokens generated by a cheap model is the best way of ensuring that the quality degrades controllably as the inference cost goes down. Distillation and quantization demand choosing the values of the hyperparameters of…