tokenbender
@tokenbender
playing reward lottery • child of oss • chaotic neutral
e/xperiments philosophy: * No muh-favourite-architecture * Stay GPU-poor, stay foolish (literally) * Forever behind SoTA, always learning * Everyone sleeps on smol models * Data curation/evaluation is the MOAT * Synthetic dataset creation is art

bro made the hardest PI ad I've seen ever
Unnecessary Logging That Goes Hard
AFFIRM: I will maximise the discounted expected value of all my futures.
AFFIRM: I will realize all of my speculative futures.
attention as a "learned function of what to focus" on is a gift that keeps on giving. all the heavy lifting in hybrids is done via that. attention-less architectures suck despite copying everything else from transformers. malleability of ANY objective lies there.
Is there some low-rank + MDL/Occam’s argument to explain why in context learning (and it sort of being a low rank update) is able to cause very efficient shifts in behavior vs SFT?
expect GPT5 to be an innovation that eventually obsoletes the model picker.
OpenAI's podcast on GPT 4.5 described a 2 year process, though they don't disclose the exact training run duration. And yet GPT 5 comes mere months later. This kind of suggests they did something different for GPT-5, no? More efficient arch? Maybe fine-grained MoE? Guesses?
did you maximize the future light cone of neurotransmitter tonnage today anon?
it's been probably 5 months now since I've bookmarked anything. it's great.
Japanese scientists successfully removed the extra chromosome causing Down syndrome in lab cells.
i probably haven't endured any other ai product as much as claude code. ~ 1 tok/s output :(
