Sasha Krassovsky
@bztree
Performance @AnthropicAI. Programmer who's gotta go fast. Love playing with new hardware and compilers. Formerly databases. Opinions my own.
Another thought about GPU kernel launch parameters: each kernel needs different ones, so if you want to chain a bunch of kernels together, you basically are forced to eat the kernel launch overhead. This is (iiuc) the principal reason why those guys that made the Llama…
I don't know why but the fact that Haskell separates the type declaration from the definition rubs me the wrong way. It feels so... outdated Lean's syntax is much prettier
I am a proud owner of a Claude-procured tungsten cube. He also recently held a book-signing event for the launch of the book Floating Point Numerics for Games and Simulations, where I purchased a signed copy!
Anthropic staff realized they could ask Claude to buy things that weren’t just food & drink. After someone randomly decided to ask it to order a tungsten cube, Claude ended up with an inventory full of (as it put it) “specialty metal items” that it ended up selling at a loss.
Wow, we really are in the Stone Age of GPU programming...
Another interesting problem with kernel languages like CUDA and Metal: the grid and threadgroup sizes are often load-bearing for the kernel's correctness, but we currently don't have a way to encode this in the function signature and/or type system, meaning we just have to write…
Another interesting problem with kernel languages like CUDA and Metal: the grid and threadgroup sizes are often load-bearing for the kernel's correctness, but we currently don't have a way to encode this in the function signature and/or type system, meaning we just have to write…
My top two unsolved ML compiler problems: - Type systems: currently it’s impossible to make sure your code is doing the right thing. Subtle precision bugs, or just outright doing the wrong math. I want a type system that helps me statically verify something - Expressing sparsity:…
A super interesting property of systems in equilibrium is that circular reasoning actually works!
Just came back from visiting Hong Kong and Taipei, and now I’m sad. Why can’t we have Asian megacities in the US?
Hats off to the ffmpeg guy for fighting the good fight in owning the ribs
Remember the post about rav1d, the dav1d AV1 decoder transpiled to C. Rust is actually 35% slower.
I wonder if the buildout of these GW datacenters will be accompanied by a Stuxnet v2, which will subtly overclock all the GPUs until they burn out
How do you actually profile a Metal kernel? I want at minimum to be able to know the execution time of the kernel not taking into account any dispatch time, but that seems hard to do?
Finally, @AnthropicAI's @Si_Boehm and @bztree break down what it’s like running inference on NVIDIA GPUs, Google TPUs, and AWS Tranium, including architecture quirks, performance tradeoffs, and the tools they use to make it all work: youtube.com/watch?v=-k6yik…
Dear Haskellers / other type enthusiasts: what kind of type system can I employ to ensure correctness of programs where the only type is float32?
Ok update here!! TensTorrent reached out and got this taken care of! Turns out they had a trade-in program for those affected by the deprecation. Glad they take customers' wellbeing seriously (even just a rando on X like me), and I'm excited to write some Wormhole kernels!
Just updated my Tenstorrent Metallium for the first time in like 6 months and I guess now my e75 is considered end of life and I can't use it?? What gives? If I black hole is that going to be a paperweight in a few months again?
Just updated my Tenstorrent Metallium for the first time in like 6 months and I guess now my e75 is considered end of life and I can't use it?? What gives? If I black hole is that going to be a paperweight in a few months again?
I feel like NP-completeness and recursive enumerability are the actually the same thing, it’s just that NP-complete things have exponentially many variants and recursively enumerable things have infinitely many variants