Sebastian Raschka

@rasbt

ML/AI researcher & former stats professor turned LLM research engineer. Author of "Build a Large Language Model From Scratch" (https://amzn.to/4fqvn0D).

Joined October 2012

1KFollowing

344KFollowers

Pinned

Sebastian Raschka@rasbt · Jul 19

From GPT to MoE: I reviewed & compared the main LLMs of 2025 in terms of their architectural design from DeepSeek-V3 to Kimi 2. Multi-head Latent Attention, sliding window attention, new Post- & Pre-Norm placements, NoPE, shared-expert MoEs, and more... magazine.sebastianraschka.com/p/the-big-llm-…

rasbt's tweet card. From DeepSeek-V3 to Kimi K2: A Look At Modern LLM Architecture Design

359

2.0K

107.0K

Sebastian Raschka@rasbt · Jul 21

The new Qwen3 update takes back the benchmark crown from Kimi 2. Some highlights of how Qwen3 235B-A22B differs from Kimi 2: - 4.25x smaller overall but has more layers (transformer blocks); 235B vs 1 trillion - 1.5x fewer active parameters (22B vs. 32B) - much fewer experts in…

AAiBattle@AiBattle_ · Jul 21

Kimi K2 🆚 Qwen-3-235B-A22B-2507 The new updated Qwen 3 model beats Kimi K2 on most benchmarks. The jump on the ARC-AGI score is especially impressive An updated reasoning model is also on the way according to Qwen researchers

140

999

481

85.0K

Sebastian Raschka@rasbt · Jul 13

Btw if you're learning how to build LLMs from the ground up, there's now a 17h companion video course for my LLMs From Scratch book on Manning: manning.com/livevideo/mast… It follows the book chapter by chapter, so it works great either as a standalone or code-along resource. It's…

rasbt's tweet image. Btw if you're learning how to build LLMs from the ground up, there's now a 17h companion video course for my LLMs From Scratch book on Manning: manning.com/livevideo/mast…

It follows the book chapter by chapter, so it works great either as a standalone or code-along resource.
It's…

288

2.0K

103.0K

Sebastian Raschka Retweeted

Yuchen Jin@Yuchenj_UW · Jul 11

Holy shit. Kimi K2 was pre-trained on 15.5T tokens using MuonClip with zero training spike. Muon has officially scaled to the 1-trillion-parameter LLM level. Many doubted it could scale, but here we are. So proud of the Moum team: @kellerjordan0, @bozavlado, @YouJiacheng,…

139

2.0K

628

131.0K

Sebastian Raschka Retweeted

Sebastian Raschka@rasbt · Jul 8

Right now LLM providers serve you one-size-fits-all LLMs, which is one of the main bottlenecks.

129

19.0K

Sebastian Raschka Retweeted

Shane Parrish@ShaneAParrish · Jul 7

I started coding again recently. No, not vibe coding. Actual coding. I fell in love with coding as a teenager. I’d stay up all night ‘playing on my computer’ and sleep all day in school. Coding is like a drug for me, everything in the world disappears. Hours pass like…

494

255

48.0K

Sebastian Raschka@rasbt · Jul 7

There were many people who asked me about joining their startup & co-founding a startup in the past 2 weeks. Not career advice, but just my 2 cents: I actually think it’s a great time to start a bootstrapped startup compared to going the VC-backed route. These days, there are…

453

125

40.0K

Sebastian Raschka@rasbt · Jun 29

I read a few chapters and I would recommend this for any AI enthusiast who is willing to read more. A lot of topics are very complex and Sebastian explains the "why" then gives papers, code and homework for you to figure out "what and how." 10/10

SSebastian Raschka@rasbt · Jun 29

Since it's summer, and more or less internship and tech interview season, I made all 30 chapters of my Machine Learning Q and AI book freely available for the summer: sebastianraschka.com/books/ml-q-and… Hope it’s helpful! Happy reading, and good luck if you are interviewing!

16.0K

Sebastian Raschka@rasbt · Jun 29

rasbt's tweet card. A curated book of 30 concise Q&A chapters on modern machine learning and AI, from embeddings to transformers to evaluation.

324

2.0K

148.0K

Sebastian Raschka@rasbt · Jun 27

Job roles in 2027: We let LLMs focus on the “how”. We focus on the “why”. Programmer → Code Composer Before: Writing code line by line After: Designing logic & structuring systems Web Dev → Experience Designer Before: Building layouts and components After: Defining flow,…

475

217

42.0K