vik
@vikhyatk
teaching computers how to see @moondreamai
New Moondream 2B release! 🌛 visual reasoning (!!) 🌜better object detection 🌛 improved UI understanding 🌜40% faster text generation
this is impressive
[1/9] We created a performant Lipschitz transformer by spectrally regulating the weights—without using activation stability tricks: no layer norm, QK norm, or logit softcapping. We think this may address a “root cause” of unstable training.
anyone try implementing qk-clipping yet? i don't really want to materialize QKᵀ... flex_attention score_mod can't be used to write values out... is there an easy/cheap way to extract the per-head max attn logit?
my only problem is that i have 4000 tabs open in my browser right now. if i could fix that i would become a 10x engineer
using sonnet to write a pytorch module: $0.038 using sonnet to write a react component: $33.74
if you're using dcp·save switch to ·async_save today. i'd been putting it off for a while but it's like one line of code. seems to be a free lunch as far as i can tell
[1/6] Curious about Muon, but not sure where to start? I wrote a 3-part blog series called “Understanding Muon” designed to get you up to speed—with The Matrix references, annotated source code, and thoughts on where Muon might be going.
my resting heart rate has increased by 12% ever since yabai's focus follows mouse setting stopped working on my laptop
when someone says my code is "production ready" i'm never sure whether they mean it's really good or just good enough
using a rented remote gpu: “this gpu is but an abstraction, a concept; it does not exist in the physical plane” Using my own 3090: “omg ru ok i heard ur coil whine a bit louder, its kinda hot today let’s save inference for tomorrow”
if you’ve ever wanted to watch a movie but only have 15 minutes, there are youtube channels that will summarize the highlights for you. a happy medium between reading the plot on wikipedia and wasting your life watching the whole thing
