Zygi
@nonagonono
Making computers solve problems we can't. Occasional cypherpunk. 🏳️🌈. DMs open @nonagon on bsky
i don’t understand why apple and google haven’t shipped cryptographic photo/video attestation. have they communicated anything about intentionally not doing it?
did u know you can use the new Gemini image segmentation feature in… a lot of different ways
did u know you can use the new Gemini image segmentation feature in… a lot of different ways
will contacts suddenly become much more fashionable?
introducing Waves, camera glasses for creators. record in stealth. livestream all day. pre-order now.
Big day for Lean! Alex Gerko of XTX Markets is donating $10M to the Lean FRO and the new Mathlib Initiative to support the future of formal mathematics and machine-checked proofs. Thank you, Alex Gerko and Convergent Research, for believing in the mission. Read the full…
it's kinda weird that keyboards still don't have a dedicated button to mute/unmute the mic. it's prob the non-default action I use most frequently, with the relevant app out of graphical context, and the first custom keybinding I make in discord and other chat apps.
Fun fact: When you're using a good learning rate, the gradient should be almost *perpendicular* to direction of the last step (opposite of the intuition of many gradient descent diagrams that make it look like the gradient is following a smooth path). You can derive this by…
There is a more general version of this question: why not scale up the parameters of the attention operation and make it more expressive? (you can do it as suggested below, or simply increase the dimension of QKV) The empirical answer is that it’s not nearly as effective as…
What happens if you Q,K,V = mlp(x).split(3) instead of linear(x).split(3) ? Anyone tried this?
Laker and I are presenting this work in an hour at ICML poster E-2103. It’s on a theoretical framework and language (modula) for optimizers that are fast (like Shampoo) and scalable (like muP). You can think of modula as Muon extended to general layer types and network topologies
every two months or so i buy sour candy, eat it, my tongue burns and blisters, i swear to not eat it again, but then my tongue heals up, my resolve weakens and the cycle repeats why does sour taste so good
ive already solved the problem in my head wdym i have to set infra and write code for it now it’s already solved