Daria Soboleva
@dmsobol
Making MoE models work @Cerebras & posting about it | Creator of SlimPajama | Ex-@Google @Yandex @Cisco.
After more than a year of getting burned with MoE gotchas, I finally sat down and wrote the guide I wish existed. Every paper skips the messy production details. This fills those gaps. No theory without implementation. cerebras.ai/moe-guide
Let's talk about MoE: 🔶 How many experts should you use? 🔶 How does dynamic routing actually behave in production? 🔶 How do you debug a model that won’t train? 🔶 What does 8x7B actually mean for memory and compute? 🔶 What hardware optimizations matter for sparse models?…
Thanks to @aiDotEngineer for releasing the recording of our Mixture of Agents workshop! Watch it here: youtube.com/watch?v=tzRvcT… 🧵 with insights from it: x.com/dmsobol/status…
The most annoying thing about working with LLMs isn't that they're wrong -- it's the endless refinement loop. Here's how Mixture of Agents (MoA) eliminates the back-and-forth that kills productivity. 1/n 🧵