Archit Sharma
@archit_sharma97
RL, post-training, reasoning research @GoogleDeepMind | co-created: Gemini Deep Think series, DPO | prev: @Stanford @Google Brain @IITKanpur @MILAMontreal
when i finished grad school, part of me was hoping that i would no longer be working on results right up to deadline…excited about tomorrow!
I’m excited to share the news of Gemini Deep Think’s gold-medal level performance 🥇 at the International Math Olympiad! It has been an absolute blast building Deep Think this year and then scaling it to the IMO.
An advanced version of Gemini with Deep Think has officially achieved gold medal-level performance at the International Mathematical Olympiad. 🥇 It solved 5️⃣ out of 6️⃣ exceptionally difficult problems, involving algebra, combinatorics, geometry and number theory. Here’s how 🧵
It's worth noting that a DeepThink system with no access to this corpus also got gold (again according to the official graders), with exactly the same score.
who will solve P6 with a general RL method first?
I have been waiting for this to be announced, it’s so amazing to see such elegant scaling of the Deep Think system where the same system can now achieve a gold at IMO! deepmind.google/discover/blog/…
An advanced version of Gemini with Deep Think has officially achieved gold medal-level performance at the International Mathematical Olympiad. 🥇 It solved 5️⃣ out of 6️⃣ exceptionally difficult problems, involving algebra, combinatorics, geometry and number theory. Here’s how 🧵
it’s rare to find surprising results in the AI scene — I was not expecting AI to slowdown devs in any scenario!
We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
Gemini 2.5 Pro Deep Think is an SVG artist! Prompt: "Draw a SVG of a Pelican riding a bicycle" Left: Gemini 2.5 Pro Right: Gemini 2.5 Pro Deep Think Credit: simonwillison.net/2024/Oct/25/pe…
🚀🤔 Huge effort from our world class Research, Inference & Deployment teams
Deep Think in 2.5 Pro has landed. 🤯 It’s a new enhanced reasoning mode using our research in parallel thinking techniques - meaning it explores multiple hypotheses before responding. This enables it to handle incredibly complex math and coding problems more effectively.
Presenting this @iclr_conf Saturday 3-5:30 Hall 2B/3 poster 540. Come say hi!!
Feeling spooked👻🎃? Get grounded...introducing "Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval." Meet LeReT (Learning to Retrieve by Trying), a RL-based framework that improves LLM’s ability to use retrieval tools by up to 29%. sherylhsu.com/LeReT/