vik

@vikhyatk

teaching computers how to see @moondreamai

Seattle

Joined November 2008

781Following

20KFollowers

Pinned

vik@vikhyatk · Jun 24

New Moondream 2B release! 🌛 visual reasoning (!!) 🌜better object detection 🌛 improved UI understanding 🌜40% faster text generation

696

279

57.0K

Pinned

vik@vikhyatk · Jul 19

this is impressive

LLaker Newhouse@LakerNewhouse · Jul 19

[1/9] We created a performant Lipschitz transformer by spectrally regulating the weights—without using activation stability tricks: no layer norm, QK norm, or logit softcapping. We think this may address a “root cause” of unstable training.

3.0K

Pinned

vik@vikhyatk · Jul 19

anyone try implementing qk-clipping yet? i don't really want to materialize QKᵀ... flex_attention score_mod can't be used to write values out... is there an easy/cheap way to extract the per-head max attn logit?

7.0K

vik@vikhyatk · 15 h

my only problem is that i have 4000 tabs open in my browser right now. if i could fix that i would become a 10x engineer

3.0K

vik@vikhyatk · 15 h

using sonnet to write a pytorch module: $0.038 using sonnet to write a react component: $33.74

369

12.0K

vik@vikhyatk · 17 h

if you're using dcp·save switch to ·async_save today. i'd been putting it off for a while but it's like one line of code. seems to be a free lunch as far as i can tell

1.0K

vik@vikhyatk · 17 h

huh did the torch docs website just get a redesign?

5.0K

vik Retweeted

Laker Newhouse@LakerNewhouse · Jul 21

[1/6] Curious about Muon, but not sure where to start? I wrote a 3-part blog series called “Understanding Muon” designed to get you up to speed—with The Matrix references, annotated source code, and thoughts on where Muon might be going.

312

426

30.0K

vik@vikhyatk · Jul 20

my resting heart rate has increased by 12% ever since yabai's focus follows mouse setting stopped working on my laptop

2.0K

vik@vikhyatk · Jul 20

ouch

bbycloud@bycloudai · Jul 20

remember how random rewards would boost qwen-2.5 performance out of the blue? turns out it might be because of data contamination but i do wanna believe reasoning in python gave qwen an advantage even under that condition tho

5.0K

vik@vikhyatk · Jul 20

a gentle reminder

5.0K

vik@vikhyatk · Jul 20

there’s a limited time spicy mcmuffin… brb

2.0K

vik@vikhyatk · Jul 20

Okay, let's see. Okay, let's tackle this problem. Okay,

2.0K

vik@vikhyatk · Jul 20

when someone says my code is "production ready" i'm never sure whether they mean it's really good or just good enough

2.0K

vik Retweeted

lowvram@lowvram · Jul 19

using a rented remote gpu: “this gpu is but an abstraction, a concept; it does not exist in the physical plane” Using my own 3090: “omg ru ok i heard ur coil whine a bit louder, its kinda hot today let’s save inference for tomorrow”

1.0K

108

38.0K

vik@vikhyatk · Jul 19

i was told not to use VICReg i didn't listen to them they were right

2.0K

vik@vikhyatk · Jul 19

if you’ve ever wanted to watch a movie but only have 15 minutes, there are youtube channels that will summarize the highlights for you. a happy medium between reading the plot on wikipedia and wasting your life watching the whole thing

vikhyatk's tweet image. if you’ve ever wanted to watch a movie but only have 15 minutes, there are youtube channels that will summarize the highlights for you.

a happy medium between reading the plot on wikipedia and wasting your life watching the whole thing

3.0K