Vithu Thangarasa
@vithursant19
Principal ML Research Scientist at @CerebrasSystems, prev. at @Tesla and @UberAILabs, and former grad student at @uoguelph_mlrg and @VectorInst.
Excited to be here at ICML 2025 in Vancouver! 🇨🇦 Come swing by the @CerebrasSystems booth (#108) to meet the team, chat about our ICML work, and see how Cerebras delivers the fastest inference in the world across a wide range of frontier models. Thrilled to be presenting two…

Ever since the release of Cerebras inferencing in HotChips2024, Cerebras has been handing Groq massive Ls. DeepSeek R1 Llama3 70B: Cerebras: 2256 tok/s/user Groq: 398 tok/s/user
'Jonathan Ross and I made this bet in 2017. Groq is now the fastest inference solution in market' Society would expect @chamath to be truthful. I mean pick a model...any model. Look at independent benchmarks. These charts aren't hard to read.
The worst part of @CerebrasSystems inference? You don't time to make an espresso while Cline codes. Join our hackathon next Saturday to explore the new paradigm: instant inference. $5k in prizes → RSVP below
Grifters gonna grift @CerebrasSystems is faster on every model that matters
.@JonathanRoss321 (the inventor and father of TPU) and I made this bet in 2017. @GroqInc is now the fastest inference solution in market today. Here are some lessons learned so far: - if we assume we get to Super Intelligence and then General Intelligence, the entire game…
'Jonathan Ross and I made this bet in 2017. Groq is now the fastest inference solution in market' Society would expect @chamath to be truthful. I mean pick a model...any model. Look at independent benchmarks. These charts aren't hard to read.
.@JonathanRoss321 (the inventor and father of TPU) and I made this bet in 2017. @GroqInc is now the fastest inference solution in market today. Here are some lessons learned so far: - if we assume we get to Super Intelligence and then General Intelligence, the entire game…
After more than a year of getting burned with MoE gotchas, I finally sat down and wrote the guide I wish existed. Every paper skips the messy production details. This fills those gaps. No theory without implementation. cerebras.ai/moe-guide
Let's talk about MoE: 🔶 How many experts should you use? 🔶 How does dynamic routing actually behave in production? 🔶 How do you debug a model that won’t train? 🔶 What does 8x7B actually mean for memory and compute? 🔶 What hardware optimizations matter for sparse models?…
Generate and iterate on code instantly. 40x faster than Sonnet-4. Free to use. Get started with @cline and @cerebrassystems below 👇
According to Open Router, @CerebrasSystems is more than twice as fast as @GroqInc. Find today's data here: openrouter.ai/meta-llama/lla…
Introducing powerful new features in Le Chat, making it more capable and more fun!
The world's fastest AI inference is now available in @awscloud Marketplace. It's easier than ever to access models like @AIatMeta Llama, @Alibaba_Qwen, @deepseek_ai, consolidate billing, and experience the speed of Cerebras.
Frontier AI is now on Cerebras. This week we are launching Qwen3-235B—@Alibaba’s flagship reasoning model that rivals ChatGPT and Claude. In classic Cerebras style, we run the model at 1,500 tokens/second. That means reasoning time goes from 60 seconds on GPUs to just 0.6…
Deep research on ChatGPT takes over ten minutes. We do it under 30 seconds. Using Qwen3-235B on Cerebras, we've benchmarked agentic search workflows that process 10M+ document enterprise repositories in seconds, not hours. Early enterprise customers report 20x faster…
We launched Qwen 235B on the Cerebras cloud. Not only is it 18 X times faster than the leading GPU offering but it cuts response time from minutes to less than a second. Faster response. Faster answers. What’s not to like?
"Before Cerebras, everything sits sub 200 tokens per second output. And after us, on every model, you have vast improvements, order of magnitude improvements. And what this allows you to do is deliver something special and different to your customers —faster responses, richer…
Sean Lie, Cerebras CTO, highlighted an uncomfortable truth for the AI industry at @VentureBeat Transform panel.
Featured Paper at @icmlconf - The Internationall Conference on Machine Learning: SD² - Self-Distilled Sparse Drafters Speculative decoding is a powerful technique for reducing the latency of Large Language Models (LLMs), offering a fault-tolerant framework that enables the…
AI Scaling Laws from the Cerebras Perspective A new blog post by our CTO, Sean Lie cerebras.net/blog/the-cereb…
Preprint: arxiv.org/abs/2504.08838 Please let us know if you have any questions or comments, we’d love to hear your thoughts. This was a joint work with @mklasby, @NishSinnadurai, Valavan Manohararajah, Sean Lie, @yanii, & @vithursant19.