Aran Komatsuzaki

@arankomatsuzaki

Looking for a cofounder

Joined November 2016

294Following

140KFollowers

Aran Komatsuzaki@arankomatsuzaki · Jul 22

After more than a year of getting burned with MoE gotchas, I finally sat down and wrote the guide I wish existed. Every paper skips the messy production details. This fills those gaps. No theory without implementation. cerebras.ai/moe-guide

CCerebras@CerebrasSystems · Jul 22

Let's talk about MoE: 🔶 How many experts should you use? 🔶 How does dynamic routing actually behave in production? 🔶 How do you debug a model that won’t train? 🔶 What does 8x7B actually mean for memory and compute? 🔶 What hardware optimizations matter for sparse models?…

140

111

21.0K

Aran Komatsuzaki Retweeted

Omer Lux@OmerLux · Jul 22

🧵1/ Meet Plan for Speed: Dilated Unmasking Scheduler (DUS) for Masked Diffusion LMs - a drop-in, inference-only planner that shatters the speed–quality trade-off. 📄 Paper: arxiv.org/abs/2506.19037 🌐 Site and Demos: omerlux.github.io/DUS-for-MDLMs/

4.0K

Aran Komatsuzaki Retweeted

Karan Vaidya@KaranVaidya6 · Jul 22

Agents aren’t reliable. They don’t learn from experience. At @composiohq, we provide skills that evolve with your agents @lightspeedvp gave us $25M to make agents usable

228

146

1.0K

433

519.0K

Aran Komatsuzaki Retweeted

Kaiqu Liang@kaiqu_liang · Jul 10

🤔 Feel like your AI is bullshitting you? It’s not just you. 🚨 We quantified machine bullshit 💩 Turns out, aligning LLMs to be "helpful" via human feedback actually teaches them to bullshit—and Chain-of-Thought reasoning just makes it worse! 🔥 Time to rethink AI alignment.

113

609

427

181.0K

Aran Komatsuzaki Retweeted

Nadav Schneider@NadavSch · Jul 10

Introducing Diff-Mamba! 🧠🔥 Differential design has been shown to reduce over-allocation of attention to irrelevant context in Transformers—improving robustness, ICL, retrieval, and long-context capabilities. Can it be effectively applied to Mamba? Answers in the thread🧵👇

103

16.0K

Aran Komatsuzaki Retweeted

Reka@RekaAILabs · Jul 9

🚀 Meet Reka Research––agentic AI that 🤔 thinks → 🔎 searches → ✏️ cites across the open web and private docs to answer your questions. 🥇 State-of-the-art performance, available now via our API and Playground!

599.0K

Aran Komatsuzaki Retweeted

Reka@RekaAILabs · Jul 8

Excited to introduce Reka Vision, an agentic visual understanding and search platform. Transform your unstructured multimodal data into insights and actions.

123

485.0K

Aran Komatsuzaki Retweeted

Ori Press@ori_press · Jul 2

Do language models have algorithmic creativity? To find out, we built AlgoTune, a benchmark challenging agents to optimize 100+ algorithms like gzip compression, AES encryption and PCA. Frontier models struggle, finding only surface-level wins. Lots of headroom here!🧵⬇️

154

22.0K

Aran Komatsuzaki Retweeted

OWL@wayfarerlabs · Jul 2

Turbocharge speed & quality gains in Diffusion World Models! 🚨 ︀︀ ︀︀- 8x8 AE w/ depth latents → 4x fewer tokens, 4x FPS boost ︀︀- 4x4 flow+depth AE in progress → next-level consistency ︀︀- DMD distillation: 16→2 steps = 8x faster sampling ︀︀- Custom RoPE fix → 20x…

11.0K

Aran Komatsuzaki@arankomatsuzaki · Jun 30

Eeehaaw! Having so much fun working on this w/ @arankomatsuzaki and @broyojo42 Come hack with us!

AAran Komatsuzaki@arankomatsuzaki · Jun 30

We've been reproducing various AlphaEvolve results and seeing early promise. Here's one from circle packing. We're also getting strong results on sphere packing, finite set sums/diffs (à la Terry Tao), and >10% AIME abs. gains by mimicking Deep Think. More soon. Credits:…

8.0K

Aran Komatsuzaki Retweeted

Hunyuan@TencentHunyuan · Jun 27

🚀 Introducing Hunyuan-A13B, our latest open-source LLM. As an MoE model, it leverages 80B total parameters with just 13B active, delivering powerful performance that scores on par with o1 and DeepSeek across multiple mainstream benchmarks. Hunyuan-A13B features a hybrid…

264

1.0K

650

371.0K