Clive Chan
@itsclivetime
intelligence per picojoule @openai / prev led dojo workload @tesla
some personal Requests For Research - along the same lines as my full stack ML post, I'm all about what the optimal mapping of intelligence to picojoules is. - if you can draw any shapes on the EUV mask, what pJ/fma can you achieve for an FP8 tensor core? what unconventional…
High context people should write more 'requests for research / project' to give talented but unknown people threads to pull on and break into their worlds. Loosely "If you did a world class blog post on X, I know many people who'd be excited to meet you".
cool thoughts here - in particular i resonate with the focus on research, rather than cashing in on the high-ROI applications (e.g. agentic products) also a glimpse of the ai talent ecosystem in china, which is pretty opaque to me since I can't read or speak 🙃
some of my own thoughts and opinions behind k2 bigeagle.me/2025/07/kimi-k…
RL is starting to work on nontrivial-to-verify domains :) Also, completely general natural language methods!
1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
Humanity has prevailed (for now!) I'm completely exhausted. I figured, I had 10h of sleep in the last 3 days and I'm barely alive. I'll post more about the contest when I get some rest. (To be clear, those are provisional results, but my lead should be big enough)
I recently made the news because of a doc I wrote in Meta’s GenAI organization. ‘The Information’ wrote about it as if I did a big raging ‘mic drop’ before leaving the company. Nothing could be further from the truth - so setting the record straight here. open.substack.com/pub/blankevoor…
GPUs are not optimal GEMM machines, though - they take large haircuts from the intra-SM and out-of-SM data movement (see the 10TB/s D2D on GB200 - not cheap at all!). Both are left out of the Bill Dally slide you show. TPUs are closer but still suffer from an enormous crossbar.
It might not be a literal scam, in the same way that GB200 has “30x” H100 performance (*for specific model size, latency constraints, and parallelism strategies that really favor GB200) Not the kind of subtlety VCs are paying for though.
At the end of the day we are responsible for our availability.
OpenAI o3-pro is rolling out now to all Pro users in ChatGPT and in the API.
hardware is hard! how do you know the hundreds of manufacturing steps for the billions of transistors and wires on your chip all worked correctly? this article is both super cool and super obvious - just stress test every instruction. it's just like how many hyperscalers burn-in…
x.com/i/article/1930…