Eric Alcaide
@eric_alcaide
common prosperity
Wake up honey, new RWKV paper just dropped 🧵⤵️ Paper: arxiv.org/abs/2404.05892 Code: github.com/BlinkDL/RWKV-LM Models: huggingface.co/RWKV (Apache 2.0 license) (1/6)

Tough day for the Chicago boys
No way
Did you know “Kimi K2: Open Agentic Intelligence” has the abbreviation….🤫
"running on my very limited GPU access at FAIR (Meta)" 😭😭😭
No matter how AI evolves overnight—tech, career, how it may impact me—I remain committed to using "physics of language models" approach to predict next-gen AI. Due to my limited GPU access at Meta, Part 4.1 (+new 4.2) are still in progress, but results on Canon layers are shining
🧵On Baselines in LLM Architecture Research, a Tale of DeltaNet and RWKV-7 (1) (full essay at github.com/BlinkDL/zoology)
RWKV7-G1 "GooseOne" 🪿 2.9B release: pure RNN (attention-free) reasoning model, +5.2T tokens, comparable with Qwen2.5 3B / Llama3.2 3B and fully multilingual. Chat demo & weights on RWKV.com 7B training in progress.
So real, and so sad. Reasonable prevention is the best individual action; while we work to remove this burden.
Yes. Additionally, you're considered creepy if you find this endless carnage objectionable. Normie women will get the ick, e/accs will say your job is to burn out faster like good fuel, safetyists will call you egoistic, Theists will peddle cope. Onto the conveyor belt, meat.
AGI will be built on the concept of equivalence classes. You will not build an AGI until you figure out how to approximate the partition functions of energy-based models. A partition function is fundamentally a sum over equivalence classes.
Ok real talk. Say you have 2 weeks to live, normal people would spend time with family, but you have no life so you decided to burn gpus instead. King Jenson gave u all the ib cluster you need. Angeles of Data gives you magical s3 bucket that has any data you can imagine.…
Unfortunately, bitnet-b1.58-2B-4T is not looking good either🙃Please test your model before release: huggingface.co/spaces/Jellyfi…
The "Uncheatable Eval" is good at detecting model quality. For example, our community noticed a 1.58bit 500MB params model "Bonsai" on HF with decent evals, turns out it's evalmaxxing🙃please test your model before release
A Dostoyevsky line has never hit me so hard
Don't be rude. Every imposition devoid of grace is doomed to collapse.
Cost profit margin 545%
🚀 Day 6 of #OpenSourceWeek: One More Thing – DeepSeek-V3/R1 Inference System Overview Optimized throughput and latency via: 🔧 Cross-node EP-powered batch scaling 🔄 Computation-communication overlap ⚖️ Load balancing Statistics of DeepSeek's Online Service: ⚡ 73.7k/14.8k…