R
Rosinality
@rosinality
no side-effects
Seoul, Korea
Joined October 2008
975Following
2KFollowers
R
Rosinality@rosinality · Jul 25
Attention-FFN disaggregation. This was explored by ByteDance previously (arxiv.org/abs/2504.02263), but StepFun vastly improved latency.
they just quietly released Step3 tech report, btw github.com/stepfun-ai/Ste… cc @zephyr_z9
1
0
4
1
563
R
R
R

Rosinality@rosinality · Jul 23
arxiv.org/abs/2507.16577 Expand the state size by N times, then select the top-K among them and update using softmax weights.


0
0
1
0
144