tokenbender

@tokenbender

playing reward lottery • child of oss • chaotic neutral

Joined July 2014

590Following

9KFollowers

Pinned

tokenbender@tokenbender · Oct 21, 2023

e/xperiments philosophy: * No muh-favourite-architecture * Stay GPU-poor, stay foolish (literally) * Forever behind SoTA, always learning * Everyone sleeps on smol models * Data curation/evaluation is the MOAT * Synthetic dataset creation is art

tokenbender's tweet image. e/xperiments philosophy:
* No muh-favourite-architecture
* Stay GPU-poor, stay foolish (literally)
* Forever behind SoTA, always learning
* Everyone sleeps on smol models
* Data curation/evaluation is the MOAT
* Synthetic dataset creation is art

203

54.0K

tokenbender@tokenbender · 3 h

bro made the hardest PI ad I've seen ever

kkalomaze@kalomaze · 11 h

Unnecessary Logging That Goes Hard

1.0K

tokenbender@tokenbender · 7 h

AFFIRM: I will maximise the discounted expected value of all my futures.

@@CHLOE21E8融化成沙子@chloe21e8 · 16 h

AFFIRM: I will realize all of my speculative futures.

732

tokenbender@tokenbender · 7 h

attention as a "learned function of what to focus" on is a gift that keeps on giving. all the heavy lifting in hybrids is done via that. attention-less architectures suck despite copying everything else from transformers. malleability of ANY objective lies there.

JJames Chen@jchencxh · 9 h

Is there some low-rank + MDL/Occam’s argument to explain why in context learning (and it sort of being a low rank update) is able to cause very efficient shifts in behavior vs SFT?

1.0K

tokenbender@tokenbender · 8 h

expect GPT5 to be an innovation that eventually obsoletes the model picker.

xxlr8harder@xlr8harder · 22 h

OpenAI's podcast on GPT 4.5 described a 2 year process, though they don't disclose the exact training run duration. And yet GPT 5 comes mere months later. This kind of suggests they did something different for GPT-5, no? More efficient arch? Maybe fine-grained MoE? Guesses?

1.0K

tokenbender@tokenbender · 9 h

did you maximize the future light cone of neurotransmitter tonnage today anon?

634

tokenbender@tokenbender · Jul 25

it's been probably 5 months now since I've bookmarked anything. it's great.

1.0K

tokenbender Retweeted

Globe Eye News@GlobeEyeNews · Jul 25

Japanese scientists successfully removed the extra chromosome causing Down syndrome in lab cells.

513

9.0K

68.0K

5.0K

2.9M

tokenbender@tokenbender · Jul 25

i probably haven't endured any other ai product as much as claude code. ~ 1 tok/s output :(

980