Gary Marcus
@GaryMarcus
Built two AI companies, wrote six books, tried to warn you about a lot of things.
Detailed comparison of the new DeepMind and OpenAI gold performances (to the extent that is even possible), co-written with @ErnestSDavis, only at Marcus on AI.

I seriously don’t see how alignment could ever possibly be achieved in (pure) LLMs in light of results like these. Game. Set. Match. We need to move on.
Another day, another completely unexpected @OwainEvans_UK result. LLMs are weirder, much weirder than you think. Good luck keeping them safe, secure, and aligned.
Another day, another completely unexpected @OwainEvans_UK result. LLMs are weirder, much weirder than you think. Good luck keeping them safe, secure, and aligned.
New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵
welp! serious problem for scaling test time compute?
New Anthropic Research: “Inverse Scaling in Test-Time Compute” We found cases where longer reasoning leads to lower accuracy. Our findings suggest that naïve scaling of test-time compute may inadvertently reinforce problematic reasoning patterns. 🧵
a fool and his money soon part
The @xAI goal is 50 million in units of H100 equivalent-AI compute (but much better power-efficiency) online within 5 years
👇Breaking coda to the IMO Gold story: In some ways – especially transparency and replicability – this dances circles around the OpenAI and DeepMind IMO reports. Not only does @lyang36 get comparable performance out of an already publicly available model, supplemented with some…
🚨 Olympiad math + AI: We ran Google’s Gemini 2.5 Pro on the fresh IMO 2025 problems. With careful prompting and pipeline design, it solved 5 out of 6 — remarkable for tasks demanding deep insight and creativity. The model could win gold! 🥇 #AI #Math #LLMs #IMO2025
What I don’t understand is why MAGA ever discussed these files at all. How could they have not seen this day would come?
Wow! The Epstein files must be really, really bad for Trump.
🤔
What a charlatan @GillVerd / @BasedBeffJezos is. First minute into his hourlong interview about thermodynamic computing, he casually implies the human brain is “operating near the Landauer limit” 🤡 It's actually about 10^8 less efficient. How about some pushback @MLStreetTalk?
Does fascism plus chatbots rate with the Enlightenment?
Eric Schmidt believes we are entering a new epoch, comparable to the Enlightenment. Just as humans shifted from faith to reasoning, we are now facing non-human intelligence with superior reasoning skills. This rise of AGI and superintelligence will happen faster than most…
We were *SO* close!
When the the AI turns us all into paper clips it may well be because someone at the Pentagon or in Beijing forgot to add the right special token at the end of a prompt. 🤷♂️