BlinkDL (@BlinkDL_AI)

Pinned

B

BlinkDL@BlinkDL_AI · Jul 24

RNN+Pretrain+Scaling is all you need. Introducing RWKV-7 G0 🪿 7.2B, the strongest pure RNN reasoning model (can self-correct math mistakes). Download & Details: github.com/BlinkDL/RWKV-L… and it's only +2T tokens - I am training stronger RNNs🙂

BBlinkDL@BlinkDL_AI · May 20

RWKV7-G1 "GooseOne" 🪿 2.9B release: pure RNN (attention-free) reasoning model, +5.2T tokens, comparable with Qwen2.5 3B / Llama3.2 3B and fully multilingual. Chat demo & weights on RWKV.com 7B training in progress.

2

16

93

27

13.0K

Pinned

B

BlinkDL@BlinkDL_AI · Jun 11

Songlin blocked me on X and banned me from FLA discord. I guess she truly wants her side of the story to keep 🙃 You can't change history, can you?

BBlinkDL@BlinkDL_AI · Jun 11

So now Songlin is mad. It began when I saw an obviously wrong MQAR result of RWKV-7 posted by Songlin (see x.com/BlinkDL_AI/sta……). I told Songlin to use RWKV-LM, and got a very fierce reply in official FLA group. Songlin pinned the personal attack for several days.🙃

2

0

42

2

5.0K

B

BlinkDL@BlinkDL_AI · 18 h

So I tested some AIME (including a modified AIME2025 question to detect memorization) and it's quite amazing that a pure RNN can solve easy ones. So RNN getting IMO gold is certainly possible after further scaling🤣

BBlinkDL@BlinkDL_AI · Jul 24

RNN+Pretrain+Scaling is all you need. Introducing RWKV-7 G0 🪿 7.2B, the strongest pure RNN reasoning model (can self-correct math mistakes). Download & Details: github.com/BlinkDL/RWKV-L… and it's only +2T tokens - I am training stronger RNNs🙂

0

2

38

7

2.0K

B

BlinkDL@BlinkDL_AI · Jul 24

p.s. I think arXiv papers can be the next source of reasoning data: (1) Locate difficult yet predictable tokens (2) Use them for RL (3) "Solving" papers will be more than enough to solve the badly-named "Humanity's Last Exam"🙂

BBlinkDL@BlinkDL_AI · Jul 24

RNN+Pretrain+Scaling is all you need. Introducing RWKV-7 G0 🪿 7.2B, the strongest pure RNN reasoning model (can self-correct math mistakes). Download & Details: github.com/BlinkDL/RWKV-L… and it's only +2T tokens - I am training stronger RNNs🙂

1

31

11

2.0K

B

BlinkDL@BlinkDL_AI · Jul 1

And 15 new RWKV papers in June🙂check rwkv.com (total 108 RWKV papers now)

BBlinkDL@BlinkDL_AI · Jun 30

RWKV-8 "Heron" preview (2) - DeepEmbedAttention (DEA), particularly suitable for hybrid models (1/9 KV cache size of MLA). The goal of RWKV-8 is to achieve longctx with 0 KV cache, and I have some progress too🙂

1

7

43

11

3.0K

B

BlinkDL@BlinkDL_AI · Jun 6

You can add empty "think" for RWKV7-G1 to get higher quality response while saving tokens.

BBlinkDL@BlinkDL_AI · May 20

RWKV7-G1 "GooseOne" 🪿 2.9B release: pure RNN (attention-free) reasoning model, +5.2T tokens, comparable with Qwen2.5 3B / Llama3.2 3B and fully multilingual. Chat demo & weights on RWKV.com 7B training in progress.

0

12

5

2.0K

B

BlinkDL@BlinkDL_AI · May 30

RWKV papers rwkv.com : 15 new in Apr/May 2025 🔥 DualComp using RWKV-7 for efficient compression, and RWKVQuant doing 3.275bit. RWKV-7 "Goose" 🪿 is 100% RNN and efficiently test-time-training its state via in-context gradient descent at every token in parallel.

BBlinkDL@BlinkDL_AI · Apr 8

RWKV papers on rwkv.com : 13 new papers in Mar 2025 🔥 RWKV-7 "Goose" 🪿 is 100% RNN and a meta-in-context learner, efficiently test-time-training its state on the context via in-context gradient descent at every token in parallel.

0

6

52

10

3.0K

B

BlinkDL@BlinkDL_AI · May 29

Try RWKV-8 DeepEmbed if you haven't 🔥 Better than Gemma3n PLE, and easier to use too.

BBlinkDL@BlinkDL_AI · May 26

RWKV-8 "Heron" preview (1) - DeepEmbed. Seems Gemma3n is trying similar tricks (Per-Layer Embedding), so I will discuss it first 🪶 It's essentially free performance - lots of params, but can be offloaded to RAM/SSD, and simple to train and deploy🚀

1

11

70

13

6.0K