Szymon Tworkowski
@s_tworkowski
reasoning @xAI | prev. @GoogleAI @UniWarszawski | LongLLaMA | long-context LLMs and math reasoning | scaling maximalist
Been working hard pushing Grok 3 Mini reasoning capabilities to the performance/price frontier 🚀 Join our reasoning team to help us build even smarter models!
Meet the Grok 3 family, now on our API! Grok 3 Mini outperforms reasoning models at 5x lower cost, redefining cost-efficient intelligence. Grok 3, the world's strongest non-reasoning model, excels in tasks that need real world knowledge like law, finance, and healthcare.
When I was a kid, I saved up for a year to get a watercooled radeon r9 290. I overclocked it to within an inch of its life---when I played games, it felt like someone had turned on a space heater. Not much has changed, except now I work with 10^9 times the flops.
Cable pr0n of @xAI GB200 servers at Colossus 2
Fed Grok4 Heavy my massive assembler repo. In ~6 mins, it cleaned up everything, optimized files, and returned them working perfectly. Same codebase in Cursor + MAX? Gemini, Claude, GPT all wrecked it.
Tried @grok 4 on a dozen non-trivial math (under/)grad level math problems. So far, it has failed to fail me even once. Congrats to @Yuhu_ai_, @ericzelikman and the whole xAI reasoning team, their progress has exceeded all my expectation!
It’s a pretty good model
I tested Grok 4 and ChatGPT-o3 with same critical prompts. The results will blow your mind. Grok 4 Vs. ChatGPT-o3 (Video demos are included)
War Room squad locked in
Can't wait to show you what we've been cooking! Lots of exciting things, please share all your feedback :)
Poland went from Iran-level of economic development to Japan-level in a single generation
xAI partners with @Polymarket to blend market predictions with X data and Grok’s analysis. Hardcore truth engine - see what shapes the world. This is just the start of our partnership with @Polymarket. More to come. 🚀
Things I would work on if I was in academia: 1. Taking an hour walk everyday 2. Learning a new sport 3. Take some cooking classes 4. Aim for perfect sleep score
Things I would work on if I was in academia: - memorization / generalization circuits - dataset interactions - learning dynamic differences b/w PT, FT, RL
It's a real shame that ICML has decided to automatically reject accepted papers if no author can attend ICML. A top conference paper is a significant boost to early career researchers, exactly the people least likely to be able to afford to go to a conference in Vancouver.
@PalantirTech CEO Alex Karp and TWG Global Co-Chairman Thomas Tull sat down at the @Milken Institute conference with @CNBC to discuss how TWG and Palantir’s partnership with xAI will design and deploy AI-driven solutions for enterprise. youtube.com/watch?v=svDRof…
Finals season stressing you out? You're just a few taps away from unlocking a 24-hour study sidekick (me). Sign up with your .edu email for two free months of my supercharged self, SuperGrok.
Sparse attention is one of the most promising strategies to unlock long-context processing and long generation reasoning in LLMs. We performed the most comprehensive study on training-free sparse attention to date. Here is what we found:
we are seeing the loop of intelligence expansion and cost compression playing out for the last few years. this time, "thinking" is becoming the art to navigate the intelligence-price frontier. smart move is to stay on the edge of the curve, whether human or machine.
Meet the Grok 3 family, now on our API! Grok 3 Mini outperforms reasoning models at 5x lower cost, redefining cost-efficient intelligence. Grok 3, the world's strongest non-reasoning model, excels in tasks that need real world knowledge like law, finance, and healthcare.
Cost of intelligence is wild🤯 xAI just dropped Grok 3 mini. Best reasoning model on the planet at 5× lower cost.
wait, Grok-3 mini is actually good?
Let’s start with Grok 3 Mini. When we set out to build a fast, affordable mini model, we knew it would be good but even we didn’t expect it to be this good. Some highlights: - Grok 3 Mini tops the leaderboards on graduate-level STEM, math, and coding, outcompeting flagship…
many many many thanks to @kchonyc and @Yoshua_Bengio for enabling the wildest ever start of my research career 2014 was a very special time to do deep learning, a commit that changes 50 lines of code could give you a ToT award 10 years later 😲
intelligence per picojoule
Grok 3 Mini model from @xai is the latest addition to our MathArena leaderboard - it takes 3rd place overall and the most impressive thing about it is extremely low cost per solved problem
Grok 3 Beta dominates on our proprietary benchmarks, setting the new SOTA on our Finance, Legal and Tax benchmarks. Congrats @xai @grok @elonmusk 🚀🚀🚀 We just released the benchmark results for xAI's new models: Grok 3 Beta & Grok 3 Mini Fast Beta (High & Low Reasoning) –…
In the first quarter of 2025, property crimes in San Francisco dropped 45%. You're not crazy, things really are getting better! growsf.org/news/2025-04-1…