Ani Baddepudi
@AniBaddepudi
product, model behavior @googledeepmind
yeah
Ani, are you ok? So, Ani are you ok? Are you ok, Ani? Ani, are you ok? So, Ani are you ok? Are you ok, Ani?
We're imagining a future where Gemini can see what you see -- as @AniBaddepudi says, "Everything is vision" Catch Ani & @OfficialLoganK talking about Gemini's SOTA ability to understand videos, images, documents, how we got here and where we're going! youtube.com/watch?v=K4vXva…
A conversation with @AniBaddepudi about Gemini's vision capabilities, how we got to SOTA, and where we and the ecosystem go next. Ani is a friend and collaborator so this conversation was a lot of fun : )
🙏 mathenchant.wordpress.com/2025/06/17/rem… blogs.ams.org/matheducation/… Kelly founded the Hampshire College Summer Studies in #Math program in 1971, one of oldest math camps in USA. He influenced many people, who went on to make great contributions in math and out. en.m.wikipedia.org/wiki/Hampshire…
How does an AI model actually learn to see? 🤖 Learn about the tech behind native multimodality, how models reason over visual data like documents and video, and the future of proactive AI assistants with @OfficialLoganK and Gemini Model Behavior Product Lead, @AniBaddepudi. ↓…
Gemini is becoming a much more helpful & enjoyable model to interact with, lots more to come! 2.5 pro 05-06 vs 2.5 pro update
Our latest update to Gemini 2.5 Pro is here. It's SoTA on GPQA Diamond, AIDER and HLE. The team has also worked hard to improve the model on style, persona and creativity. We're excited to see what you build with it. Please let us know any feedback as we're eternally cooking.
The Gemini 2.5 models are magical for analyzing sports video. We asked Gemini to find Draymond's defensive plays from a highlights reel, which requires the model to: - reason “over pixels” to identify defensive plays - identify players in the video using its world knowledge -…
Gemini 2.5 Pro (05-06) is SOTA at most video understanding tasks (by a large margin) 📽️. Lots of work by the Gemini multimodal team to make this happen, excited to see developers push this capability in new ways. More details below!
Thrilled to share our latest advances in video understanding 📽️: Gemini 2.5 Pro is a truly magical model to play with, excelling in traditional video analysis and unlocking new use cases I could not imagine a few months ago🪄 More in 🧵 and @Google blog: developers.googleblog.com/en/gemini-2-5-…
Gemini 2.5 Pro is incredible at video understanding, try posting a YouTube link into AI studio ai.dev and asking it questions about the video. You will be amazed!
Gemini 2.5 Pro (05-06) is SOTA at most video understanding tasks (by a large margin) 📽️. Lots of work by the Gemini multimodal team to make this happen, excited to see developers push this capability in new ways. More details below!
although the vision leaderboard doesn't capture every vision use case, 60+ elo points reflects the significant step in core vision capabilities like transcription, spatial understanding, reading charts/diagrams & many more. Still a lot more to do, but 2.5 Pro is the best vision…

2.5 flash is a crazy good workhorse model for high volume vision workloads. For $10, you can process 55+ hrs of video / ~250K (!!) document pages with market-leading quality – a huge step up from 2.0 flash. and it's super fast and fun to use!
2.5 Flash is a huge jump from 2.0 (which was a huge jump from 1.5)!
Chatting with video content feels a lot more natural with gemini 2.5's deeper world knowledge and stronger semantic video understanding -- and it's a ton of fun to play with! Try it out with your youtube links at ai.dev
