Norman Mu
@TheNormanMu
Introducing AI Note Writer API 🤖 AI helping humans. Humans still in charge. Starting today, the world can create AI Note Writers that can earn the ability to propose Community Notes. Their notes will show on X if found helpful by people from different perspectives — just like…
18 years after @Jrdmnz @jason4short and I created ArduPilot, here it is destroying large parts of the Russian air force. Crazy
Your expensive bombers are not safe from cheap little drones.
really interesting results here. even blind collaboration between different models >> single model performance
...and that's how coincidences work: just a day after the Sonnet / Gemini Alloy post was published, the eval data from #Grok4 comes in: - It beats the Sonnet / Gemini alloy (58% to 55%) - But gets even better when alloyed with Sonnet itself to a mind-blowing 67%
is it really a repricing of labor? IMO we only see this bc labor can walk off with 90% of the gains of capital in their head (a limited number of extremely expensive experimental results). No Nvidia employee can ever get poached like this x.com/TheNormanMu/st…
fun fact: OpenAI and Nvidia have both grown about the same in valuation (10x) in the ~2 years since GPT-4 Jan 2023 -> Jun 2025 OpenAI: $30B -> $300B Nvidia: $360B -> $4.2T
fun fact: OpenAI and Nvidia have both grown about the same in valuation (10x) in the ~2 years since GPT-4 Jan 2023 -> Jun 2025 OpenAI: $30B -> $300B Nvidia: $360B -> $4.2T



We spotted a couple of issues with Grok 4 recently that we immediately investigated & mitigated. One was that if you ask it "What is your surname?" it doesn't have one so it searches the internet leading to undesirable results, such as when its searches picked up a viral meme…
kudos to the team for sharing the full story, warts and all also, many of your favorite AI influencers spent this entire week bloviating with zero information
Technical Details: Before releasing changes to @grok on the X platform, we follow standard procedures to conduct evaluations and tests for performance and behavior. Before a new version of an underlying xAI Grok LLM is connected to @grok, the underlying LLM is subjected to…
What you have to remember is that Claude is not real. Claude is one of the fictional protagonists of the story that the LLM is trained to write. When you “tell Claude to run a business” the LLM attempts to write a story wherein the Claude character runs a business.
I have been saying this. It’s not a paperclip maximizer. It’s not an anything maximizer. It’s a random bumbler.
I hadn’t taken the “Manhattan Project for AI” thing very seriously, but we do kind of live in a weird alternative history where a bunch of private companies discovered nuclear power before the government did
People—teachers, students, parents—have been complaining for a century that memorization is pointless when ”you can just look it up“. This complaint predates AI, it predates Google, it predates this internet. But it’s wrong. Here’s Pauling on why he gave closed notes exams:
NEW POST on AI, learning and why knowing stuff still matters. Link in reply ⬇️
results with opus 4 and sonnet 4 are surprisingly similar. what's saturating: benchmarks or model size?
Introducing the next generation: Claude Opus 4 and Claude Sonnet 4. Claude Opus 4 is our most powerful model yet, and the world’s best coding model. Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.