mike64_t

@mike64_t

descending the gradient

Joined October 2022

294Following

3KFollowers

Pinned

mike64_t@mike64_t · Aug 7

It is done. LibreCUDA can now launch CUDA kernels without relying on the proprietary CUDA runtime / driver api. It does so by communicating directly with the hardware via the ioctl "rm api" and Nvidia's QMD MMIO command queue structure.

mike64_t's tweet image. It is done. LibreCUDA can now launch CUDA kernels without relying on the proprietary CUDA runtime / driver api. It does so by communicating directly with the hardware via the ioctl "rm api" and Nvidia's QMD MMIO command queue structure.

347

125

34.0K

mike64_t@mike64_t · Jul 26

And that is why it is not yet the end of unsupervised learning. You can believe that.

rroon@tszzl · Jul 26

pretraining is an elegant science, done by mathematicians who sit in cold rooms writing optimization theory on blackboards, engineers with total absorb of distributed systems of titanic scale posttraining is hair raising cowboy research where people drinking a lot of diet coke…

757

mike64_t@mike64_t · Jul 24

when the scaling law isn’t a function of parameter count

557

mike64_t@mike64_t · Jul 23

The Saga of wandb color debates continues. Apparently everyone at prime intellect is color blind...

160

20.0K

mike64_t@mike64_t · Jul 22

POV: The Sacred Timeline we have diverged from

mmike64_t@mike64_t · Jul 22

Doesn't look that horrible, eh? xD

1.0K

mike64_t Retweeted

mike64_t@mike64_t · Jul 22

Doesn't look that horrible, eh? xD

1.0K

mike64_t@mike64_t · Jul 22

It's honestly incomprehensible to me that we haven't started writing training solutions like game engines. Stable, well designed abstractions, in a clean zero-dependency C++ project. You know game engines also just build GPU command buffers, right?

SSemiAnalysis@SemiAnalysis_ · Jul 21

AI researchers when they discovered that torch.compile doesn't scale well to real multi-node production training workloads and is a giant footgun

242

20.0K

mike64_t@mike64_t · Jul 21

fuck yeah

116

11.0K

mike64_t@mike64_t · Jul 21

yeet

2.0K

mike64_t@mike64_t · Jul 20

not a language model - and yet predicting text

719

mike64_t@mike64_t · Jul 19

I have to say in many respects I've had more quality conversations in Berlin than in SF. Never has it happened in SF that people actually pull out pen and paper to talk precisely about hard concepts where surface level talk is just not enough. The overton window is wider and…

ssamsja@samsja19 · Jul 19

I realized at our Berlin event that there are a lot of talented and ambitious young ppl in Europe. Just (almost) no inspiring company to build the future nor VC that have the balls to give them a chance. No wonder why everybody wants to come to sf|

190

14.0K