DataVoid
@DataPlusEngine
The First step in knowing is admitting you don't
AI visionaries tend to be. A dreamer who can not dream. They are utterly engulfed within their own doctrine that their daring stabs at the truth amount to moving numbers on a plot.
Wow the new qwen reasoner at only 232B params is as good as the top closed frontier lab models Big day for OS
It was missing, so I added @AnthropicAI Opus 4 Thinking and @OpenAI o3 benchmark results to the comparison mix chart 🆚🔎 Vibe check pending, but on benchmarks it seems that we got an open model competitive with Opus 4 / o3 / Gemini 2.5 🤯
3Blue1Brown x Welch Labs is the crossover we didn’t know we needed. stunning breakdown of diffusion models. visual, intuitive, elegant.
New video on the details of diffusion models: youtu.be/iv-5mZ_9CPY Produced by @welchlabs, this is the first in a small series of 3b1b this summer. I enjoyed providing editorial feedback throughout the last several months, and couldn't be happier with the result.
damn, I always have a mental model that an action of a LM should be a sequence (a turn, or until a tool call) instead of a token, but people keep telling my that token-level loss is better… Thank Qwen team for verifying my mental model, now it makes much more sense.
Proud to introduce Group Sequence Policy Optimization (GSPO), our stable, efficient, and performant RL algorithm that powers the large-scale RL training of the latest Qwen3 models (Instruct, Coder, Thinking) 🚀 📄 huggingface.co/papers/2507.18…
Wan 2.2 is now officially confirmed open-source. Text-to-image, and video. Open-source will soon get a new powerful tool.
🚨Announcing the open-source release of Wan2.2. Stay tuned.
Time Blindness is the last big barrier for LLM's and image/video models
It took me 5 milliseconds to make this, proving that AI bros are idiots.
RNN+Pretrain+Scaling is all you need. Introducing RWKV-7 G0 🪿 7.2B, the strongest pure RNN reasoning model (can self-correct math mistakes). Download & Details: github.com/BlinkDL/RWKV-L… and it's only +2T tokens - I am training stronger RNNs🙂
RWKV7-G1 "GooseOne" 🪿 2.9B release: pure RNN (attention-free) reasoning model, +5.2T tokens, comparable with Qwen2.5 3B / Llama3.2 3B and fully multilingual. Chat demo & weights on RWKV.com 7B training in progress.
Published 0.0.9 version of imscore 1. Added EvalMuse (used for SeedDream 2.0 evals) 2. Added preliminary support for CycleReward (contrib @NagaSaiAbhinay) 3. Added VQAScore (used for Imagen 3 evals) github.com/RE-N-Y/imscore
*Emergence and Evolution of Interpretable Concepts in Diffusion Models* by @berk_tinaz @zalan_fabian @mahdisoltanol SAEs trained on cross-attention layers of StableDiffusion are (surprisingly) good and can be used to intervene on the generation. arxiv.org/abs/2504.15473
I might go back to my roots and dip my toe back into LLM fine-tuning. [I never released anything related to that work publicly]
kimi k2 is contributor in kimi k2 paper. let this sink in.
All because they don't want to pay to train people. It takes time to train an AI workforce, but that's the only way. And it makes more sense to train people than to pay billions of dollars for one or two guys.
Called it years ago. I had it dead on the mark from the start. Won the game before anyone knew it existed. Total schizo victory
The Era of DiffusionLM might be upon us "Diffusion Beats Autoregressive in Data-Constrained Settings" they find that DiffusionLMs outperform AR models if you’re bottlenecked by data rather than FLOPs same data can be reused for 100 epochs vs 4 epochs, as dLLM learns far more!
Why do video models handle motion so poorly? It might be lack of motion equivariance. Very excited to introduce: Flow Equivariant RNNs (FERNNs), the first sequence models to respect symmetries over time. Paper: arxiv.org/abs/2507.14793 Blog: kempnerinstitute.harvard.edu/research/deepe… 1/🧵
Guys, AGI incoming. This new AI does reasoning WITHOUT PRETRAINING, running on a potato (2 GPUs). This is a seriously big deal. WTF. 40.3% on ARC-AGI Running on a toaster. How? 👀
🚀Introducing Hierarchical Reasoning Model🧠🤖 Inspired by brain's hierarchical processing, HRM delivers unprecedented reasoning power on complex tasks like ARC-AGI and expert-level Sudoku using just 1k examples, no pretraining or CoT! Unlock next AI breakthrough with…
Diffusion Beats Autoregressive in Data-Constrained Settings Comparison of diffusion and autoregressive language models from 7M to 2.5B params and up to 80B training tokens. Key findings: 1. Diffusion models surpass autoregressive models given sufficient compute. Across a wide…
my mutual trained a 7B model that's definitely better than you at JEE maths anon
Excited to share Aryabhatta 1.0, our leading model that scores 90.2% on JEE Mains, outperforming frontier models like o4 mini and Gemini Flash 2.5 Trained by us at @AthenaAgentRL , in collaboration with @physics__wallah, using custom RLVR training on 130K+ curated JEE problems…