DataVoid

@DataPlusEngine

The First step in knowing is admitting you don't

Joined June 2023

541Following

2KFollowers

Pinned

DataVoid@DataPlusEngine · Mar 9

AI visionaries tend to be. A dreamer who can not dream. They are utterly engulfed within their own doctrine that their daring stabs at the truth amount to moving numbers on a plot.

2.0K

DataVoid@DataPlusEngine · Jul 25

Wow the new qwen reasoner at only 232B params is as good as the top closed frontier lab models Big day for OS

aapolinario 🌐@multimodalart · Jul 25

It was missing, so I added @AnthropicAI Opus 4 Thinking and @OpenAI o3 benchmark results to the comparison mix chart 🆚🔎 Vibe check pending, but on benchmarks it seems that we got an open model competitive with Opus 4 / o3 / Gemini 2.5 🤯

416

26.0K

DataVoid@DataPlusEngine · Jul 25

3Blue1Brown x Welch Labs is the crossover we didn’t know we needed. stunning breakdown of diffusion models. visual, intuitive, elegant.

GGrant Sanderson@3blue1brown · Jul 25

New video on the details of diffusion models: youtu.be/iv-5mZ_9CPY Produced by @welchlabs, this is the first in a small series of 3b1b this summer. I enjoyed providing editorial feedback throughout the last several months, and couldn't be happier with the result.

107

2.0K

848

81.0K

DataVoid@DataPlusEngine · Jul 26

damn, I always have a mental model that an action of a LM should be a sequence (a turn, or until a tool call) instead of a token, but people keep telling my that token-level loss is better… Thank Qwen team for verifying my mental model, now it makes much more sense.

CChujie Zheng@ChujieZheng · Jul 25

Proud to introduce Group Sequence Policy Optimization (GSPO), our stable, efficient, and performant RL algorithm that powers the large-scale RL training of the latest Qwen3 models (Instruct, Coder, Thinking) 🚀 📄 huggingface.co/papers/2507.18…

200

100

16.0K

DataVoid@DataPlusEngine · Jul 25

Wan 2.2 is now officially confirmed open-source. Text-to-image, and video. Open-source will soon get a new powerful tool.

TTongyi Lab@Ali_TongyiLab · Jul 25

🚨Announcing the open-source release of Wan2.2. Stay tuned.

127

6.0K

DataVoid@DataPlusEngine · Jul 26

112

DataVoid@DataPlusEngine · Jul 25

Fractal Attention Resonance

115

DataVoid@DataPlusEngine · Jul 25

Time Blindness is the last big barrier for LLM's and image/video models

131

DataVoid Retweeted

Ippi@Coolzippity · Jul 25

It took me 5 milliseconds to make this, proving that AI bros are idiots.

279

DataVoid@DataPlusEngine · Jul 24

RNN+Pretrain+Scaling is all you need. Introducing RWKV-7 G0 🪿 7.2B, the strongest pure RNN reasoning model (can self-correct math mistakes). Download & Details: github.com/BlinkDL/RWKV-L… and it's only +2T tokens - I am training stronger RNNs🙂

BBlinkDL@BlinkDL_AI · May 20

RWKV7-G1 "GooseOne" 🪿 2.9B release: pure RNN (attention-free) reasoning model, +5.2T tokens, comparable with Qwen2.5 3B / Llama3.2 3B and fully multilingual. Chat demo & weights on RWKV.com 7B training in progress.

13.0K

DataVoid Retweeted

NYRE@sleenyre · Jul 23

Published 0.0.9 version of imscore 1. Added EvalMuse (used for SeedDream 2.0 evals) 2. Added preliminary support for CycleReward (contrib @NagaSaiAbhinay) 3. Added VQAScore (used for Imagen 3 evals) github.com/RE-N-Y/imscore

2.0K

DataVoid Retweeted

Simone Scardapane@s_scardapane · Jul 22

*Emergence and Evolution of Interpretable Concepts in Diffusion Models* by @berk_tinaz @zalan_fabian @mahdisoltanol SAEs trained on cross-attention layers of StableDiffusion are (surprisingly) good and can be used to intervene on the generation. arxiv.org/abs/2504.15473

303

173

11.0K

DataVoid@DataPlusEngine · Jul 23

Okay. This is worse than YandereDev now

283

DataVoid@DataPlusEngine · Jul 23

I might go back to my roots and dip my toe back into LLM fine-tuning. [I never released anything related to that work publicly]

153

DataVoid Retweeted

himanshu@himanshustwts · Jul 22

kimi k2 is contributor in kimi k2 paper. let this sink in.

886

40.0K

DataVoid Retweeted

D.#dwards@P33RL3SS · Jul 23

All because they don't want to pay to train people. It takes time to train an AI workforce, but that's the only way. And it makes more sense to train people than to pay billions of dollars for one or two guys.

101

DataVoid@DataPlusEngine · Jul 22

Called it years ago. I had it dead on the mark from the start. Won the game before anyone knew it existed. Total schizo victory

aalphaXiv@askalphaxiv · Jul 22

The Era of DiffusionLM might be upon us "Diffusion Beats Autoregressive in Data-Constrained Settings" they find that DiffusionLMs outperform AR models if you’re bottlenecked by data rather than FLOPs same data can be reused for 100 epochs vs 4 epochs, as dLLM learns far more!

443

DataVoid Retweeted

Andy Keller@t_andy_keller · Jul 22

Why do video models handle motion so poorly? It might be lack of motion equivariance. Very excited to introduce: Flow Equivariant RNNs (FERNNs), the first sequence models to respect symmetries over time. Paper: arxiv.org/abs/2507.14793 Blog: kempnerinstitute.harvard.edu/research/deepe… 1/🧵

386

253

24.0K

DataVoid@DataPlusEngine · Jul 22

Guys, AGI incoming. This new AI does reasoning WITHOUT PRETRAINING, running on a potato (2 GPUs). This is a seriously big deal. WTF. 40.3% on ARC-AGI Running on a toaster. How? 👀

GGuan Wang@makingAGI · Jul 21

🚀Introducing Hierarchical Reasoning Model🧠🤖 Inspired by brain's hierarchical processing, HRM delivers unprecedented reasoning power on complex tasks like ARC-AGI and expert-level Sudoku using just 1k examples, no pretraining or CoT! Unlock next AI breakthrough with…

189

2.0K

878

179.0K

DataVoid Retweeted

Tanishq Abraham is at ICML@iScienceLuvr · Jul 22

Diffusion Beats Autoregressive in Data-Constrained Settings Comparison of diffusion and autoregressive language models from 7M to 2.5B params and up to 80B training tokens. Key findings: 1. Diffusion models surpass autoregressive models given sufficient compute. Across a wide…

120

685

495

46.0K

DataVoid@DataPlusEngine · Jul 22

my mutual trained a 7B model that's definitely better than you at JEE maths anon

SSachin@sachdh · Jul 22

Excited to share Aryabhatta 1.0, our leading model that scores 90.2% on JEE Mains, outperforming frontier models like o4 mini and Gemini Flash 2.5 Trained by us at @AthenaAgentRL , in collaboration with @physics__wallah, using custom RLVR training on 130K+ curated JEE problems…

4.0K