theseriousadult
@gallabytes
father, ML enjoyer. building agents @cursor_ai. @midjourney v2-7.
as test time training mechanisms mature, we're going to need continual learning benchmarks. I think the most obvious one is language transfer: - train entirely in English - eval entirely in some other language - eval is a single serial pass through the dataset with TTT only
hmm this is probably optimal on torus topologies like TPU
you know we could do mixture of mixture of experts right (MoMoE)
v5 era MJ was a really magical time
My four candidates for great AI art
does anyone have a favorite (hosted) knowledge graph / semsearch / rag MCP? I want something for Claude to keep persistent research notes in. can just use Notion but would be nice to have semsearch / retrieval too.
strong agree. chatbots are severely hobbled by this too. will fix soon.
Agents without memory of me and what I care about and all the context around me are just not as useful We are so early it is not yet table stakes but it will be
a horse riding an astronaut by Kimi K2
a horse riding an astronaut, by r1-0528
a horse riding on top of an astronaut by grok4 - unlike other models which did the thing on the first try grok4 first confirmed that was exactly what I wanted & then after confirmation spent over a minute searching the web before outputting this.
a horse riding on top of an astronaut, by grok 3
I've been excited about unbalanced configs for a while - now make the encoder block-causal & you get big model prefill small model decoding!
T5Gemma: Google’s new encoder-decoder LLMs adapted from Gemma 2 via UL2/PrefixLM. - 9B–2B “unbalanced” config hits top accuracy with low latency - Outperforms Gemma 2 on GSM8K, DROP, MMLU - 32 total models: 8 sizes × (pretrained + tuned) × (UL2 + PrefixLM) Available on Hugging…
My hot take is that Claude should actually be more expensive than it is. aiui they are literally sold out of TPM. It should be auctioned.
Anthropic hasn't decreased pricing a single fucking time. I'm slowly turning bearish on Anthropic. Especially with the looming Grok-4 and GPT-5 launch. But usually they launch a new model or product within a month of me turning bearish :)
is this just a scale issue? neural nets are somewhere between 0.2 and 2% the parameter count of a human brain. think of them as extremely well read house cats.
chatgpt, claude, gemini, grok, etc have all read, comprehended, and nearly memorized every book in the world, and yet with current architectures and training techniques none of them have any truly novel knowledge to give us. really makes you think
this is all honestly quite devastating, the only advice I can offer is to read science fiction when young and impressionable. if you learn to dream of the world to come, it feels more like coming home than future shock
The new Google AI mode is surprisingly good and really fast. pretty great for factual questions with some simple conversational follow-ups.
nonsense. soon we will have infinite context. then it will be a treasure trove and you'll wish you kept more.
it takes two decades of storing everything in the cloud to realize that you don’t actually want infinite storage space
Grok : Reddit_2015 :: Claude : Tumblr/Ratcord :: ChatGPT : ???
I really want to see o3 pro native image gen. feel like it could plausibly make something this intricate. (it probably doesn't exist but it could & eventually will & it'll be great)
this as a self modifying agent harness would go so hard.
The is diabolical... a Python object that hallucinates method implementations on demand any time you call them, using my LLM Python library github.com/awwaiid/gremllm