Peter Gostev
@petergostev
London 🇬🇧 Head of AI https://www.linkedin.com/in/peter-gostev/
This chart shows the best image generation model at any given time, based on the @ArtificialAnlys arena and the model release dates. A few points stand out: - Massive gains from Dall-e 2 up to Midjourney 6. - Arguably a slowdown in progress for diffusion models since then -…

Unlike with LLMs, the image generation market is a bit less competitive - there are many players, but they are not constantly breaking new ground, and some vendors haven't released a competitive model in many months. In this data from @ArtificialAnlys image arena, we can see…

I'm old enough to remember when a year ago, 'more than $100m' was considered an 'extreme cost' to train a model

🎵 Announcing Artificial Analysis Music Arena! Vote for songs generated by leading music models across genres from pop to metal to rock & more Key details: 🏁 Participate in Music Arena and after a day of voting we’ll unveil the world’s first public ranking of AI music models.…
Wow can't believe Apple got a shoutout
The AI action plan has been released.
New 'Pelican Riding a Bicycle' agent benchmark just dropped @simonw video @ 2x speed
I propose a new term ‘subintelliphobia’: the anxiety or fear of not being able to access the highest available intelligence, e.g. when hitting a rate limit for the smartest model
Come on @gdb @sama , Google and Anthropic are currently thrashing you on this critical metric: UK roles open: - Anthropic: 21 - DeepMind: 17 - OpenAI: 3
OpenAI for UK public services: bbc.com/news/articles/…
I find it problematic that all coding & vibe coding tools default to Sonnet 4, sometimes even without an option to change the model. All vibe-coded apps end up looking the same, encountering the same kind of problems (e.g. 'I see the issue now!'). Similar to AI text slop, you can…
Grok 4 @xai matches Kimi K2's @Kimi_Moonshot coding share in @OpenRouterAI, taking share from Google. Anthropic's share stays steady.

It is crazy to me that some still don't see how big our GPU shortage is: - Most context window is <100k - Delayed rollouts of Agents, Codex - Full Sora never released - Veo 3 roll out taking weeks - Claude constant rate limits - Even big clouds default rate limits are…
we will cross well over 1 million GPUs brought online by the end of this year! very proud of the team but now they better get to work figuring out how to 100x that lol
The holy grail: "We developed new techniques that make LLMs a lot better at hard-to-verify tasks."
So what’s different? We developed new techniques that make LLMs a lot better at hard-to-verify tasks. IMO problems were the perfect challenge for this: proofs are pages long and take experts hours to grade. Compare that to AIME, where answers are simply an integer from 0 to 999.