Nikolay Savinov 🇺🇦
@SavinovNikolay
Research Scientist at @GoogleDeepMind Work on LLM pre-training in Gemini ♊ Lead 10M context length in Gemini 1.5 📈
I was leading long context in Gemini for a while now, and today I’m proud to share what the team has achieved: over 1M context in a large-scale foundation model. Big shoutout to @TeplyashinDenis and @machelreid - without you this would not have happened! youtube.com/watch?v=wa0MT8…
The new update to Gemini 2.5 Pro is awesome. It’s totally incredible to me that I can now dump 2 megabytes of code (36.5k lines of mostly Python and some HTML/JS) and it can do a really great job understanding everything and helping me. The conversation STARTS with 470k tokens!
I'm speaking at Cambridge University next Wednesday, please join me if you are interested in long context! Thanks @PetarV_93 for organizing this!
if you are in cambridge next wednesday and interested in long-context research, here's a talk not to be missed! from the absolute maestro @SavinovNikolay
China's Gaokao is the biggest exam in the world: 13M test takers and 9hrs. ~0.02% make it to the top uni, Tsinghua. As of this week, AI models can make it too. 625/750 is top 1%ile. Highest human score is ~720-740. Gemini 2.5 Pro gets 655, barely making the cut for Tsinghua!
Really exciting to see the UK government choose Gemini to help speed up the planning process across the country.
Extract – a system built by the UK government, using our Gemini foundational model – will help council planners make faster decisions. 🚀 Using multimodal reasoning, it turns complex planning documents – even handwritten notes and blurry maps – into digital data in just 40s.…
Gemini 2.5 Pro 06-05 has set a new SOTA on the aider polyglot coding benchmark, scoring 83% with 32k thinking tokens. The default thinking mode, where Gemini self-determines the thinking budget, scored 79%. Full leaderboard: aider.chat/docs/leaderboa…
The new Gemini 2.5 Pro is SOTA at long context, especially capable on higher number of items being retrieved (needles) as shown below!
🚨Breaking from Arena: @GoogleDeepMind's new Gemini-2.5-Flash climbs to #2 overall in chat, a major jump from its April release (#5 → #2)! Highlights: - Top-2 across major categories (Hard, Coding, Math) - #3 in WebDev Arena, #2 in Vision Arena - New model at the…
Today we’re updating Gemini 2.5 Flash. The new 2.5 Flash is better at reasoning, code and long context, and is second only to 2.5 Pro on the @lmarena_ai dashboard — while maintaining the speed devs love. Preview it now in Google AI Studio, @GoogleCloud Vertex AI and the…
Cursor now supports 1M context-windows, the most expensive models, and unlimited tool calls with API pricing.
How do you think we should be pricing for very expensive models (o3) or ultra-long context windows (~100k LOC) in Cursor? API pricing, higher subscription tier, something else?
Crazy times. This morning, while getting breakfast, I instructed Gemini to write a Python script that connects to my Spotify, pulls my liked songs, and uses Gemini to recommend more songs. Each time I run the script, my Spotify playlist gets updated with 20 new recommended songs…
Gemini’s attention to detail is wild. Fed a ~400k token codebase and forgot about it Found the tab still open a day later Just for fun, I dumped the entire project again after refactoring it for a full day and asked it to guess my goal
Caching of tokens when you use the same input context repeatedly now happens implicitly, making things a lot easier to deal with.
We just shipped implicit caching in the Gemini API, automatically enabling a 75% cost savings with the Gemini 2.5 models when your request hits a cache 🚢 We also lowered the min token required to hit caches to 1K on 2.5 Flash and 2K on 2.5 Pro!
No need to explicitly specify caching in Gemini API anymore, now it's done automatically - important for long-context applications, e.g. coding!
We just shipped implicit caching in the Gemini API, automatically enabling a 75% cost savings with the Gemini 2.5 models when your request hits a cache 🚢 We also lowered the min token required to hit caches to 1K on 2.5 Flash and 2K on 2.5 Pro!
1. Take a screen recording explaining your app 2. Upload it to YouTube 3. "Build me this” Gemini 2.5’s ability to comprehend video feels straight out of a science fiction novel.
Nice study on scaling needles for MRCR!
1/ Context Arena Update: Added MRCR 4needle and 8needle results for some of the top models. It's probably we'll get more model releases between today and over the next 2 weeks. I'll try my best to keep up. 😅 Top Results (4needle, AUC @ 1M): 1. Gemini 2.5 Flash Preview…
Pretty awesome result from the new version of Gemini 2.5 I changed one line of War and Peace, inserting a sentence into Book 14, Chapter 10 (halfway through), where Princess Mary "spoke to Crab Man the superhero" Gemini 2.5 consistently found this reference among 860,000 tokens
All these category strengths by the latest Gemini-2.5-Pro. 💻 Coding, Math, Creative Writing, Longer Query, ... the community loved it across all categories.
We’re releasing an updated Gemini 2.5 Pro (I/O edition) to make it even better at coding. 🚀 You can build richer web apps, games, simulations and more - all with one prompt. In @GeminiApp, here's how it transformed images of nature into code to represent unique patterns 🌱
Gemini 2.5 Pro just got an upgrade & is now even better at coding, with significant gains in front-end web dev, editing, and transformation. We also fixed a bunch of function calling issues that folks have been reporting, it should now be much more reliable. More details in 🧵
Artificial Pokémon Intelligence achieved!😀 been a lot of fun to watch - congrats to the Gemini team and thanks @TheCodeOfJoel !
What a finish! Gemini 2.5 Pro just completed Pokémon Blue!  Special thanks to @TheCodeOfJoel for creating and running the livestream, and to everyone who cheered Gem on along the way.