Stephen Panaro
@flat
making coffee and other things. @BrewTimerApp
“We won’t run it in digital because we’re purists and maniacs.”
Incoming new coremltools looks like it has some nice bits: - 8 bit input/output tensors (previously all 8bit compute was kept internal) - >1 input can be enumerated shapes (👀ANE)
Wonder if we’re gonna get a new version of coremltools. Last year it dropped on Monday.
WWDC wishes (all long shots): - low-level ANE access (a la kernels) - actual quantized activations (for KV cache) - CoreML fast Hadamard transform - share weights between CoreML and MLX (or MLX ANE backend) - ANE HW metrics: GB/s, FLOPs
Wondering if the tiny codebook (16 elements) opens any opportunities for GPU kernels (or if the scaling vectors negate it).
Figured out 4-bit /per-tensor/ quantization for Qwen2.5-0.5B. It’s on par with per-group GPTQ which is kinda cool (tbh non-uniform helps a lot). 🖇️Weights, evals, more details below.