Keiran Paster
@keirp1
MDR at xAI
Introducing OpenWebMath, a massive dataset containing every math document found on the internet - with equations in LaTeX format! 🤗 Download on @HuggingFace: huggingface.co/datasets/open-… 📝 Read the paper: arxiv.org/abs/2310.06786 w/ @dsantosmarco, @zhangir_azerbay, @jimmybajimmyba!

HOLY MOLY THE BENCHMARKS AIN'T LYING–– THIS IS THE BEST MODEL EVER!! @XAI FUCKIN COOOKED 🫶 ILY SUPERGROK 🫶
We invented so many innovative ways to feed the model challenging questions with right signals to unlock those compute and 🔥 the GPUs. This is the new beginning.
This chart is breaking my brain. When you compare cost against score, the ONLY model in the green is Grok 3 Mini.
Announcing the Nous RL Environments Hackathon in SF! Create with Atropos, Nous' RL environments framework, and claim your stake of a $50,000 prize pool. Partners - @xai @nvidia @nebiusai @SHACK15sf @akashnet_ @LambdaAPI @tensorstax and @runpod_io May 18th. Sign up below 👇👇
Pretty cool! We are the SoTA Candy Crush model!
🚨 New Challenger: GROK joins the Game Arena Benchmark! We evaluated Grok3-mini-beta: thinkining on four games: 🧩 2048 | 🧱 Sokoban | 🍬 Candy Crush | 🎮 Phoenix Wright With fast progress, it’s already comparable to top models like OpenAI’s O1, previous O3-mini, and…
Is constrained decoding ethical?
We remain deeply uncertain about the idea of “model welfare”. There’s no scientific consensus on it—or even on how to research it. We’re approaching the topic as carefully as we can. Find out more: anthropic.com/research/explo…
Grok-3 mini is freaking cheap. $0.30/$0.50 in/out per million tokens. The xAI team delivered something special here. The intelligence versus cost is unbelievably good
Today, @xAI launched a new model, Grok 3, so we’re putting it to the test to see how Grok’s latest model stacks up against Intelligent Content Management workflows. Here’s what we found: ↳ xAI’s Grok 3 has proven to be the a top performing model in our tests for both single &…
Cost of intelligence is wild🤯 xAI just dropped Grok 3 mini. Best reasoning model on the planet at 5× lower cost.
wait, Grok-3 mini is actually good?
Let’s start with Grok 3 Mini. When we set out to build a fast, affordable mini model, we knew it would be good but even we didn’t expect it to be this good. Some highlights: - Grok 3 Mini tops the leaderboards on graduate-level STEM, math, and coding, outcompeting flagship…
Grok 3 now available in beta in the Box AI Studio, and it performs extremely well at single and multi-doc Q&A as well as enterprise data extraction. Here's a test with Box AI where it generates a comprehensive report based on a number of earnings documents.
Great job done by Szymon, Keiran, Ziniu et al on the mini reasoning models!
Been working hard pushing Grok 3 Mini reasoning capabilities to the performance/price frontier 🚀 Join our reasoning team to help us build even smarter models!
In case anyone still doesn't see the insane speed that models are getting smarter and cheaper: Yesterday, Google released Gemini 2.5 Flash, a very efficient reasoning model. Today, Grok 3 mini is stronger on most benchmarks for 7x cheaper! x.com/xai/status/191…
Let’s start with Grok 3 Mini. When we set out to build a fast, affordable mini model, we knew it would be good but even we didn’t expect it to be this good. Some highlights: - Grok 3 Mini tops the leaderboards on graduate-level STEM, math, and coding, outcompeting flagship…
Grok3 mini reasoning high is a great model
tiny oversight, think you missed a model. happy to help out!
Meet the Grok 3 family, now on our API! Grok 3 Mini outperforms reasoning models at 5x lower cost, redefining cost-efficient intelligence. Grok 3, the world's strongest non-reasoning model, excels in tasks that need real world knowledge like law, finance, and healthcare.
grok 3 mini is the top-scoring LM on their code generation benchmark too (it scores worse on autocomplete-style code completion from basically different code formatting)
Introducing CipherBench v2 20 prompts to test implicit reasoning. No instructions. Just ciphers. — About the Benchmark CipherBench v2 continues the original goal of testing whether language models can recognize and solve hidden patterns without being told what to do. Where…
We are hiring researchers and engineers at xAI to build next-gen naitive multi-modal models, please apply online or dm me if you are interested! job-boards.greenhouse.io/xai/jobs/46846…