Daria Soboleva

@dmsobol

Making MoE models work @Cerebras & posting about it | Creator of SlimPajama | Ex-@Google @Yandex @Cisco.

Sunnyvale, California

Joined June 2022

617Following

648Followers

Pinned

Daria Soboleva@dmsobol · 18 h

After more than a year of getting burned with MoE gotchas, I finally sat down and wrote the guide I wish existed. Every paper skips the messy production details. This fills those gaps. No theory without implementation. cerebras.ai/moe-guide

CCerebras@CerebrasSystems · 18 h

Let's talk about MoE: 🔶 How many experts should you use? 🔶 How does dynamic routing actually behave in production? 🔶 How do you debug a model that won’t train? 🔶 What does 8x7B actually mean for memory and compute? 🔶 What hardware optimizations matter for sparse models?…

101

17.0K

Daria Soboleva@dmsobol · Jul 2

Thanks to @aiDotEngineer for releasing the recording of our Mixture of Agents workshop! Watch it here: youtube.com/watch?v=tzRvcT… 🧵 with insights from it: x.com/dmsobol/status…

DDaria Soboleva@dmsobol · Jun 4

The most annoying thing about working with LLMs isn't that they're wrong -- it's the endless refinement loop. Here's how Mixture of Agents (MoA) eliminates the back-and-forth that kills productivity. 1/n 🧵

4.0K