JB Alayrac
@jalayrac
🦩, ♊ - Research Scientist at Google DeepMind
You can now sample at higher frame rates (default 1 FPS), and specify start and end times for videos in the Gemini API! We’ve been blown away by all the ways developers are using Gemini to process videos, and see a ton of devs manually clipping and slowing down videos to use…
We just shipped video FPS support in the Gemini API, so you can dynamically customize how many frames per second you want the model to see, unlocking lots of interesting new video use cases! 📹
The newly generally available Gemini 2.5 Flash and Pro are even better at video understanding than the versions we shared in the blog a month ago, see more details in the tech report 😀
Hot Gemini updates off the press. 🚀 Anyone can now use 2.5 Flash and Pro to build and scale production-ready AI applications. 🙌 We’re also launching 2.5 Flash-Lite in preview: the fastest model in the 2.5 family to respond to requests, with the lowest cost too. 🧵
Many Congratulations to @jianyuan_wang, @MinghaoChen23, @n_karaev, Andrea Vedaldi, Christian Rupprecht and @davnov134 for winning the Best Paper Award @CVPR for "VGGT: Visual Geometry Grounded Transformer" 🥇🎉 🙌🙌 #CVPR2025!!!!!!
By popular request, you can now specify frames per second (fps), as well as start and end times, for videos in AI Studio ⏩
People want Veo 3 so we are giving access to Pro subscribers in 71 countries as of...now!
Veo 3 dropped about 100 hours ago, and it's been on 🔥🔥🔥 ever since Now, we’re excited to announce: + 71 new countries have access + Pro subscribers get a trial pack of Veo 3 on the web (mobile soon) + Ultra subscribers get the highest # of Veo 3 gens w/ refreshes How to try…
Excited that our work on Gemini Robotics and Gemini spatial understanding have just been featured on #GoogleIO stage! I believe that a frontier model possessing strong real-world understanding capabilities represents the ultimate path to embodied AGI, and we are making rapid…
The Gemini 2.5 models are magical for analyzing sports video. We asked Gemini to find Draymond's defensive plays from a highlights reel, which requires the model to: - reason “over pixels” to identify defensive plays - identify players in the video using its world knowledge -…
Gemini 2.5 Pro (05-06) is SOTA at most video understanding tasks (by a large margin) 📽️. Lots of work by the Gemini multimodal team to make this happen, excited to see developers push this capability in new ways. More details below!
Worth reading this. The video understanding capabilities of Gemini are fantastic.
Thrilled to share our latest advances in video understanding 📽️: Gemini 2.5 Pro is a truly magical model to play with, excelling in traditional video analysis and unlocking new use cases I could not imagine a few months ago🪄 More in 🧵 and @Google blog: developers.googleblog.com/en/gemini-2-5-…
Gemini 2.5 Pro sets the state of the art on our newly released Minerva video reasoning benchmark by scoring 63.5%. 📜 Paper: arxiv.org/abs/2505.00681… 📊 Dataset: github.com/google-deepmin…
A lot of work went to make Gemini 2.5 SOTA at video understanding, check out this 🧵 for more details! Looking back at where we were a year ago, the progress really feels phenomenal! So many things to unlock and enable from video 🎥 and we are only getting started!
A lot of work went to make Gemini 2.5 SOTA at video understanding, check out this 🧵 for more details! Looking back at where we were a year ago, the progress really feels phenomenal! So many things to unlock and enable from video 🎥 and we are only getting started!
Thrilled to share our latest advances in video understanding 📽️: Gemini 2.5 Pro is a truly magical model to play with, excelling in traditional video analysis and unlocking new use cases I could not imagine a few months ago🪄 More in 🧵 and @Google blog: developers.googleblog.com/en/gemini-2-5-…
Gemini 2.5 Pro is incredible at video understanding, try posting a YouTube link into AI studio ai.dev and asking it questions about the video. You will be amazed!
Gemini 2.5 Pro (05-06) is SOTA at most video understanding tasks (by a large margin) 📽️. Lots of work by the Gemini multimodal team to make this happen, excited to see developers push this capability in new ways. More details below!
although the vision leaderboard doesn't capture every vision use case, 60+ elo points reflects the significant step in core vision capabilities like transcription, spatial understanding, reading charts/diagrams & many more. Still a lot more to do, but 2.5 Pro is the best vision…
Recently had the pleasure of lecturing back at Princeton in a grad seminar. I took the opportunity to cover how scaling laws have evolved since their inception, leaning heavily on great external content from my colleagues @borgeaud_s @jalayrac @jacobaustin132 . Content in thread
The Gemini team cooked hard with Gemini 2.5 Pro, it's an awesome model that continues to lead @lmarena_ai - huge congrats to the team! Try it for yourself in the @GeminiApp now. Can't wait for you all to see what else we've been cooking 👀
Breaking: new @OpenAI models shake up the Arena leaderboard🔥 Highlights: - o3 #2 overall, ties Gemini-2.5-Pro at #1 in Style Control, Math, Coding, and Hard Prompts - o4-mini breaks into top 10 and claims #1 in Math, surpassing o1 (!) - GPT-4.1 ranks top-5 in Hard Prompts,…
The pricing for 2.5 Pro is out. Here is the pareto performance of price : lmsys elo visualized as a rainbow.