Andriy Burkov

@burkov

My new LM book: http://thelmbook.com PhD in AI, author of 📖 The Hundred-Page Language Models Book and 📖 The Hundred-Page Machine Learning Book

Québec, Canada

Joined June 2009

111Following

47KFollowers

Pinned

Andriy Burkov@burkov · Apr 10

As you know, I've resigned from my full-time job and become a professional technical book writer. I have plans for The Hundred-Page Books about reinforcement learning, computer vision, diffusion models, and more. Not having a full-time job has put a significant strain on my…

burkov's tweet image. As you know, I've resigned from my full-time job and become a professional technical book writer. I have plans for The Hundred-Page Books about reinforcement learning, computer vision, diffusion models, and more.

Not having a full-time job has put a significant strain on my…

188

93.0K

Andriy Burkov@burkov · 1 h

If you never question what the LLM writes as code, you will end up with an oversized codebase that implements all the tricks the LLM saw online, whether you need them or not.

burkov's tweet image. If you never question what the LLM writes as code, you will end up with an oversized codebase that implements all the tricks the LLM saw online, whether you need them or not.

771

Andriy Burkov@burkov · 9 h

We have a very poor understanding of why deep neural networks like transformer models learn the parameters they learn. For example, in the paper below from 2013, the authors demonstrated that 5% of the weights of a trained deep neural network can be used to predict the values of…

burkov's tweet image. We have a very poor understanding of why deep neural networks like transformer models learn the parameters they learn. For example, in the paper below from 2013, the authors demonstrated that 5% of the weights of a trained deep neural network can be used to predict the values of…

565

451

37.0K

Andriy Burkov@burkov · 13 h

I asked Claude to fix a bug and provided only a part of the application code because the full code wouldn't fit into the context. It printed updated code files, including one I hadn't shown it. So it hallucinated this file entirely, and it had exactly the same code as the…

5.0K

Andriy Burkov@burkov · 14 h

The updated reasoning Qwen3-235B-A22B-Thinking-2507, with only 235B parameters and active 22B parameters (it's an MoE), is competitive with the frontier LLMs and is Apache 2.0. The Chinese don't let greedy capitalists abuse customers. Communism 2.0.

burkov's tweet image. The updated reasoning Qwen3-235B-A22B-Thinking-2507, with only 235B parameters and active 22B parameters (it's an MoE), is competitive with the frontier LLMs and is Apache 2.0.

The Chinese don't let greedy capitalists abuse customers. Communism 2.0.

129

7.0K

Andriy Burkov@burkov · 15 h

147

19.0K

Andriy Burkov@burkov · Jul 25

The Overton window in action. Just a couple of years before ChatGPT was released, Google had to ban gorillas in its image search theguardian.com/technology/201…, while Microsoft had to shut down their chatbot because it started to insult users with racist remarks bbc.com/news/technolog….…

GGary Marcus@GaryMarcus · Jul 24

sounds like everything is going great.

12.0K

Andriy Burkov@burkov · Jul 24

True Positive Weekly #119, by @burkov open.substack.com/pub/aiweekly/p…

1.0K

Andriy Burkov@burkov · Jul 24

On the time savings with using LLM for coding: On one hand, you can code in 5 minutes what would take a day by hand. On the other hand, you can spend 2 days fixing a bug that would take only 5 minutes by hand.

509

21.0K

Andriy Burkov@burkov · Jul 24

Why, in more than 10 years, could only one person make universities abolish racist admission policies? Who will compensate for the ruined lives and future careers of thousands of affected prospective students?

burkov's tweet image. Why, in more than 10 years, could only one person make universities abolish racist admission policies?

Who will compensate for the ruined lives and future careers of thousands of affected prospective students?

3.0K

Andriy Burkov@burkov · Jul 24

Any news on diffusion language models? The last thing I heard was from Google on Gemini Diffusion in May. The limitation I'm curious about is how they will solve the fixed output size any diffusion model is supposed to have, so printing long chunks of code should be impossible.…

3.0K

Andriy Burkov@burkov · Jul 24

In one photo, you can see bread. In the other photo, you can see shit. I've been living in North America for 20 years and still cannot understand why grocery stores sell shit.

burkov's tweet image. In one photo, you can see bread. In the other photo, you can see shit.

I've been living in North America for 20 years and still cannot understand why grocery stores sell shit.

122

7.0K

Andriy Burkov Retweeted

Domesticated Brain@rasangarocks · Jul 24

The Dark Hundred-Page Language Models Book Link - amzn.to/4mafvD0 #MachineLearning #ML #AI #LLMs #MLOps #DeepLearning #DL #DataScience #DataAnalytics #code #Coding #Python #PyTorch #programming

2.0K

Andriy Burkov Retweeted

Domesticated Brain@rasangarocks · Jul 23

The Hundred-Page Machine Learning Book (The Hundred-Page Books) Link - amzn.to/3GXTiZU #MachineLearning #ML #AI #ArtificialIntelligence #code #Coding #Python #DataScience #SoftwareEngineering #programming #DeepLearning #LLMs #programming

2.0K

Andriy Burkov@burkov · Jul 24

Tesla should become a car insurance company!

SSawyer Merritt@SawyerMerritt · Jul 23

NEWS: Tesla has revealed that in Q2 2025, they recorded one crash for every 6.69 million miles driven in which drivers were using Autopilot technology. For drivers who were not using Autopilot technology, @Tesla recorded one crash for every 963,000 miles driven. By comparison,…

4.0K

Andriy Burkov@burkov · Jul 24

Your vibe code from the computer running it point of view.

2.0K

Andriy Burkov@burkov · Jul 23

Watering down the tokens generated by an expensive model with tokens generated by a cheap model is the best way of ensuring that the quality degrades controllably as the inference cost goes down. Distillation and quantization demand choosing the values of the hyperparameters of…

5.0K