Stephen Panaro

@flat

making coffee and other things. @BrewTimerApp

boston

Joined May 2013

26Following

534Followers

Pinned

Stephen Panaro@flat · Jan 17, 2016

“We won’t run it in digital because we’re purists and maniacs.”

Stephen Panaro@flat · Jul 23

Incoming new coremltools looks like it has some nice bits: - 8 bit input/output tensors (previously all 8bit compute was kept internal) - >1 input can be enumerated shapes (👀ANE)

698

Stephen Panaro@flat · Jun 10

Wonder if we’re gonna get a new version of coremltools. Last year it dropped on Monday.

669

Stephen Panaro@flat · Jun 8

WWDC wishes (all long shots): - low-level ANE access (a la kernels) - actual quantized activations (for KV cache) - CoreML fast Hadamard transform - share weights between CoreML and MLX (or MLX ANE backend) - ANE HW metrics: GB/s, FLOPs

3.0K

Stephen Panaro@flat · May 29

Wondering if the tiny codebook (16 elements) opens any opportunities for GPU kernels (or if the scaling vectors negate it).

SStephen Panaro@flat · May 29

Figured out 4-bit /per-tensor/ quantization for Qwen2.5-0.5B. It’s on par with per-group GPTQ which is kinda cool (tbh non-uniform helps a lot). 🖇️Weights, evals, more details below.

596