Aran Komatsuzaki
@arankomatsuzaki
Looking for a cofounder
After more than a year of getting burned with MoE gotchas, I finally sat down and wrote the guide I wish existed. Every paper skips the messy production details. This fills those gaps. No theory without implementation. cerebras.ai/moe-guide
Let's talk about MoE: 🔶 How many experts should you use? 🔶 How does dynamic routing actually behave in production? 🔶 How do you debug a model that won’t train? 🔶 What does 8x7B actually mean for memory and compute? 🔶 What hardware optimizations matter for sparse models?…
🧵1/ Meet Plan for Speed: Dilated Unmasking Scheduler (DUS) for Masked Diffusion LMs - a drop-in, inference-only planner that shatters the speed–quality trade-off. 📄 Paper: arxiv.org/abs/2506.19037 🌐 Site and Demos: omerlux.github.io/DUS-for-MDLMs/
Agents aren’t reliable. They don’t learn from experience. At @composiohq, we provide skills that evolve with your agents @lightspeedvp gave us $25M to make agents usable
🤔 Feel like your AI is bullshitting you? It’s not just you. 🚨 We quantified machine bullshit 💩 Turns out, aligning LLMs to be "helpful" via human feedback actually teaches them to bullshit—and Chain-of-Thought reasoning just makes it worse! 🔥 Time to rethink AI alignment.
Introducing Diff-Mamba! 🧠🔥 Differential design has been shown to reduce over-allocation of attention to irrelevant context in Transformers—improving robustness, ICL, retrieval, and long-context capabilities. Can it be effectively applied to Mamba? Answers in the thread🧵👇
🚀 Meet Reka Research––agentic AI that 🤔 thinks → 🔎 searches → ✏️ cites across the open web and private docs to answer your questions. 🥇 State-of-the-art performance, available now via our API and Playground!
Excited to introduce Reka Vision, an agentic visual understanding and search platform. Transform your unstructured multimodal data into insights and actions.
Do language models have algorithmic creativity? To find out, we built AlgoTune, a benchmark challenging agents to optimize 100+ algorithms like gzip compression, AES encryption and PCA. Frontier models struggle, finding only surface-level wins. Lots of headroom here!🧵⬇️
Turbocharge speed & quality gains in Diffusion World Models! 🚨 ︀︀ ︀︀- 8x8 AE w/ depth latents → 4x fewer tokens, 4x FPS boost ︀︀- 4x4 flow+depth AE in progress → next-level consistency ︀︀- DMD distillation: 16→2 steps = 8x faster sampling ︀︀- Custom RoPE fix → 20x…
Eeehaaw! Having so much fun working on this w/ @arankomatsuzaki and @broyojo42 Come hack with us!
We've been reproducing various AlphaEvolve results and seeing early promise. Here's one from circle packing. We're also getting strong results on sphere packing, finite set sums/diffs (à la Terry Tao), and >10% AIME abs. gains by mimicking Deep Think. More soon. Credits:…
🚀 Introducing Hunyuan-A13B, our latest open-source LLM. As an MoE model, it leverages 80B total parameters with just 13B active, delivering powerful performance that scores on par with o1 and DeepSeek across multiple mainstream benchmarks. Hunyuan-A13B features a hybrid…