Peter Gostev

@petergostev

London 🇬🇧 Head of AI https://www.linkedin.com/in/peter-gostev/

Joined June 2025

193Following

1KFollowers

Peter Gostev@petergostev · 5 h

This chart shows the best image generation model at any given time, based on the @ArtificialAnlys arena and the model release dates. A few points stand out: - Massive gains from Dall-e 2 up to Midjourney 6. - Arguably a slowdown in progress for diffusion models since then -…

petergostev's tweet image. This chart shows the best image generation model at any given time, based on the @ArtificialAnlys arena and the model release dates.

A few points stand out:
- Massive gains from Dall-e 2 up to Midjourney 6.
- Arguably a slowdown in progress for diffusion models since then -…

3.0K

Peter Gostev@petergostev · Jul 23

Unlike with LLMs, the image generation market is a bit less competitive - there are many players, but they are not constantly breaking new ground, and some vendors haven't released a competitive model in many months. In this data from @ArtificialAnlys image arena, we can see…

petergostev's tweet image. Unlike with LLMs, the image generation market is a bit less competitive - there are many players, but they are not constantly breaking new ground, and some vendors haven't released a competitive model in many months. In this data from @ArtificialAnlys image arena, we can see…

240

Peter Gostev@petergostev · Jul 23

I'm old enough to remember when a year ago, 'more than $100m' was considered an 'extreme cost' to train a model

248

Peter Gostev Retweeted

Artificial Analysis@ArtificialAnlys · Jul 23

🎵 Announcing Artificial Analysis Music Arena! Vote for songs generated by leading music models across genres from pop to metal to rock & more Key details: 🏁 Participate in Music Arena and after a day of voting we’ll unveil the world’s first public ranking of AI music models.…

133

12.0K

Peter Gostev@petergostev · Jul 23

Wow can't believe Apple got a shoutout

AAndrew Curran@AndrewCurran_ · Jul 23

The AI action plan has been released.

3.0K

Peter Gostev@petergostev · Jul 23

New 'Pelican Riding a Bicycle' agent benchmark just dropped @simonw video @ 2x speed

2.0K

Peter Gostev@petergostev · Jul 22

I propose a new term ‘subintelliphobia’: the anxiety or fear of not being able to access the highest available intelligence, e.g. when hitting a rate limit for the smartest model

199

Peter Gostev@petergostev · Jul 22

Come on @gdb @sama , Google and Anthropic are currently thrashing you on this critical metric: UK roles open: - Anthropic: 21 - DeepMind: 17 - OpenAI: 3

GGreg Brockman@gdb · Jul 22

OpenAI for UK public services: bbc.com/news/articles/…

620

Peter Gostev@petergostev · Jul 22

I find it problematic that all coding & vibe coding tools default to Sonnet 4, sometimes even without an option to change the model. All vibe-coded apps end up looking the same, encountering the same kind of problems (e.g. 'I see the issue now!'). Similar to AI text slop, you can…

504

Peter Gostev@petergostev · Jul 21

Grok 4 @xai matches Kimi K2's @Kimi_Moonshot coding share in @OpenRouterAI, taking share from Google. Anthropic's share stays steady.

petergostev's tweet image. Grok 4 @xai matches Kimi K2's @Kimi_Moonshot coding share in @OpenRouterAI, taking share from Google. Anthropic's share stays steady.

100

10.0K

Peter Gostev@petergostev · Jul 20

It is crazy to me that some still don't see how big our GPU shortage is: - Most context window is <100k - Delayed rollouts of Agents, Codex - Full Sora never released - Veo 3 roll out taking weeks - Claude constant rate limits - Even big clouds default rate limits are…

SSam Altman@sama · Jul 20

we will cross well over 1 million GPUs brought online by the end of this year! very proud of the team but now they better get to work figuring out how to 100x that lol

110

143

2.0K

617

477.0K

Peter Gostev@petergostev · Jul 19

The holy grail: "We developed new techniques that make LLMs a lot better at hard-to-verify tasks."

NNoam Brown@polynoamial · Jul 19

So what’s different? We developed new techniques that make LLMs a lot better at hard-to-verify tasks. IMO problems were the perfect challenge for this: proofs are pages long and take experts hours to grade. Compare that to AIME, where answers are simply an integer from 0 to 999.

2.0K