Vlado Boza
@bozavlado
second of his name. Destroyer of ML hype. I also enjoy making neural networks smaller. http://kaggle.com/usamec
One would think that Adafactor is just RMSProp with a rank-1 factorized scaling factor. But no. That thing has an LR scheduler, clipping, and scaling by parameter norm built in. And thus, no surprise, replacing Pytorch Adafactor with TIMM Adafactor leads to a total mess. Can…
One of the comments is equivalent to: "Your work is not novel. You are solving the graph 3-coloring problem, but [1,2,3,4] already solved the graph 2-coloring problem".
Got NeurIPS reviews. I want to start 3 out of 4 rebuttals with "You fucking idiot..."
Got NeurIPS reviews. I want to start 3 out of 4 rebuttals with "You fucking idiot..."
NVIDIA driver and cuda install on Linux runs better now than in 2015. Granted the bar was low. But last time I installed, it ran fine. That was a first. No need to use grub anymore.
I actually can't think of 1 tech product that works or runs better today than it did in 2015. They have all gotten worse right down to search bars.
This HRM thing is essentially a perceiver (see image) with some more bells and whistles.
🚀Introducing Hierarchical Reasoning Model🧠🤖 Inspired by brain's hierarchical processing, HRM delivers unprecedented reasoning power on complex tasks like ARC-AGI and expert-level Sudoku using just 1k examples, no pretraining or CoT! Unlock next AI breakthrough with…
Torch.compile is a mess even in single GPU training. When you are not doing completely typical training.
AI researchers when they discovered that torch.compile doesn't scale well to real multi-node production training workloads and is a giant footgun
Back in the days, going from float32 to float64 meant that your GPU computation would be slow as hell. Now it is the same with going from bfloat16 to float32.
I do model compression and optimization. It is essential to have access to different GPUs and that would be impossible without @vast_ai . Happy to finally meet you guys at #ICML2025 . And thanks a lot for the Nintendo Switch!
The EU AI Act says that general-purpose AI is above 1e25 FLOPS. What a crappy piece of legislation that thing is...
True, the first ever application of Muon was to break the 3-second barrier in the CIFAR-10 speedrun. For perspective on scale that was a 3e14 flop training; @Kimi_Moonshot's K2 is 3e24 flops, 10 orders of magnitude larger. x.com/_arohan_/statu…
RL is really sample efficient. We ran a small experiment on Geoguessr. With just 16 images per country, Moondream performs as well as Claude Sonnet. With the full dataset, it beats Sonnet by a decent margin while being orders of magnitude cheaper to run.