Shane Gu
@shaneguML
Gemini - RL, CoT, multilinguality. Senior Staff RS @GoogleDeepMind MTV. 🇯🇵-born 🇨🇳🇨🇦. ex: @OpenAI (JP: @shanegJP)
2023 was a tough year for me (family with 1yo and a three body problem), and coming back to Google was the best decision I made that year. It was also good for me to come back before the Gemini launch and get to know heroes behind the scenes. Many blockers are resolved or being…
What a way to celebrate one year of incredible Gemini progress -- #1🥇across the board on overall ranking, as well as on hard prompts, coding, math, instruction following, and more, including with style control on. Thanks to the hard work of everyone in the Gemini team and…
"Prioritization". I prioritized what to do based on a simple metric "(impact remaining)/(# top talent working)". Each research, model, product, or business has finite impact, and as more progresses are made and more talented people notice its impact and get in, this metric…
My team worked on a critical fix that affected ~1% traffic. It required a review from a Google Fellow. Great to push the fix in for this important model, and big thanks to amazing support from serving eng. Google prod engineers are just incredible.
Gemini 2.5 Flash-Lite, our fastest and most cost effective model, is now stable and ready for scaled production use!! It comes with native reasoning capabilities, a 1 million token context window, and is priced at ($0.10 in / 1M) and ($0.40 out / 1M).
After YEARS of waiting and uncertainty, green cards for my family are approved, and we moved to Silicon Valley from Japan last week. I've been in the Bay since 2012 on/off but this is the most exciting time to be here. Excited to join my great colleagues and make ASI🔥

Why I decided to do RL in 2016 after trying out MuProp on training a Neural Programmer and backprop failed me
For differentiable problems, there’s backpropagation. For everything else, there’s RL.
🚨 Olympiad math + AI: We ran Google’s Gemini 2.5 Pro on the fresh IMO 2025 problems. With careful prompting and pipeline design, it solved 5 out of 6 — remarkable for tasks demanding deep insight and creativity. The model could win gold! 🥇 #AI #Math #LLMs #IMO2025
IMO 🥇achieved! Really proud to have contributed to the post-training and thinking side of this model! Getting closer to ASI!
An advanced version of Gemini with Deep Think has officially achieved gold medal-level performance at the International Mathematical Olympiad. 🥇 It solved 5️⃣ out of 6️⃣ exceptionally difficult problems, involving algebra, combinatorics, geometry and number theory. Here’s how 🧵
So what kind of revenue share are we talking about :D jk jk
New blog post about asymmetry of verification and "verifier's law": jasonwei.net/blog/asymmetry… Asymmetry of verification–the idea that some tasks are much easier to verify than to solve–is becoming an important idea as we have RL that finally works generally. Great examples of…
Grok team is internalizing human data ops (e.g. recruiting for AI tutor role for Japanese). Likely more frontier labs think about owning and operating the data labor. x.com/K920_/status/1…
Grokを開発する「xAI」は日本語のAI Tutorを募集してる。 業務内容は、日本語の文章・音声・動画データのラベル付け・アノテーションなど。フルリモートで日本から働けて、報酬はアメリカ水準の高時給。 🗣️ 日本語ネイティブ 🧑💻 フルリモート 💰 時給35–65ドル(5200-9600円) 🕐 6ヶ月契約(延長あり)
If you are at ICML and interested in RL or multilinguality, please say hi to @marafinkels! We worked closely the past few months to ship an RL method to fix a critical Gemini quality issue. She has great research ideas as well! Hope Gemini x academia stay in touch.
LLMs are typically evaluated w/ automatic metrics on standard test sets, but metrics + test sets are developed independently. This raises a crucial question: Can we design automatic metrics specifically to excel on the test sets we prioritize? Answer: Yes! arxiv.org/abs/2411.15387
Grok is going viral in Japan for very predictable reasons
