Max Bain
@maxhbain
research scientist @googledeepmind Gemini, large scaling pretraining, multimodal, audio
High leverage happiness per $ activity: making your daily commute a 20min walk

method #2 to prompt a breakthrough: run those "what if i just..." / "will it break if ..." experiments, purely out of curiosity. the results may surprise you, and give you the clue you need
why does the breakthrough always happen 0-1 days before the deadline
i. yes we're still working on Audio Description ii. come to our workshop in iccv, hawaii iii. enjoy, surf, ohana
Movies are more than just video clips, they are stories! 🎬 We’re hosting the 1st SLoMO Workshop at #ICCV2025 to discuss Story-Level Movie Understanding & Audio Descriptions! Website: slomo-workshop.github.io Competition: huggingface.co/spaces/SLoMO-W…
The progress of Gemini over the last year +
Gemini 2.5 Flash Preview TTS is pretty crazy Prompted for both speakers to be opera singers and for one of them to do an impression of a dial-up modem connecting to the internet. Wasn't disappointed.
Thrilled to share our latest advances in video understanding 📽️: Gemini 2.5 Pro is a truly magical model to play with, excelling in traditional video analysis and unlocking new use cases I could not imagine a few months ago🪄 More in 🧵 and @Google blog: developers.googleblog.com/en/gemini-2-5-…
🚨Breaking: @GoogleDeepMind’s latest Gemini-2.5-Pro is now ranked #1 across all LMArena leaderboards 🏆 Highlights: - #1 in all text arenas (Coding, Style Control, Creative Writing, etc) - #1 on the Vision leaderboard with a ~70 pts lead! - #1 on WebDev Arena, surpassing Claude…
We’re releasing an updated Gemini 2.5 Pro (I/O edition) to make it even better at coding. 🚀 You can build richer web apps, games, simulations and more - all with one prompt. In @GeminiApp, here's how it transformed images of nature into code to represent unique patterns 🌱
1) The last 5% is the hardest to write, especially when you didn’t write the first 95%. 2) 99% of these startups are ngmi.
Y Combinator CEO Garry Tan has said that for about a quarter of the current YC startups, 95% of the code was written by AI.
I think people shouldn't do phds anymore. Just focus on hardcore engineering / infra in a big frontier company and branch off to research if you're interested. All the research is happening in frontier labs anyway.
1 year later and openai finally reimbursed me for interview expenses. Thank you @SoftBank 🫶
Introducing Ironwood, the first TPU built for the age of inference, and the timing could not be better : ) - Ironwood perf/watt is 2x relative to Trillium, 6th gen TPU - Ironwood offers 192 GB per chip, 6x that of Trillium - 4.5x faster data access blog.google/products/googl…
Introducing Gemini 2.5 Pro, the world's most powerful model, with unified reasoning capabilities + all the things you love about Gemini (long context, tools, etc) Available as experimental and for free right now in Google AI Studio + API, with pricing coming very soon!
BREAKING: Gemini 2.5 Pro is now #1 on the Arena leaderboard - the largest score jump ever (+40 pts vs Grok-3/GPT-4.5)! 🏆 Tested under codename "nebula"🌌, Gemini 2.5 Pro ranked #1🥇 across ALL categories and UNIQUELY #1 in Math, Creative Writing, Instruction Following, Longer…
Think you know Gemini? 🤔 Think again. Meet Gemini 2.5: our most intelligent model 💡 The first release is Pro Experimental, which is state-of-the-art across many benchmarks - meaning it can handle complex problems and give more accurate responses. Try it now →…
Agency > Intelligence I had this intuitively wrong for decades, I think due to a pervasive cultural veneration of intelligence, various entertainment/media, obsession with IQ etc. Agency is significantly more powerful and significantly more scarce. Are you hiring for agency? Are…
Intelligence is on tap now so agency is even more important
Something I've realized recently that oftentimes people love to project their career ambitions on other people. For example, some people may think building large teams or their own fancy research lab (e.g., academia) is the definition of a successful career. They then use this…
+1. After weeks of heavy usage building an iOS app with my brother, claude is still the most useful. Can't do as much in one-shot as some of the newer models but reliability >>>
Deepseek models are available now in Cursor! Hosted on US servers. While we're big fans of Deepseek, Sonnet still appears to perform much better on real-world tasks. Enjoy!
The rise and fall of London: - Once the epicentre of global finance - Now a stagnating city riddled with challenges - Mass exodus of high net worth individuals The reasons for this decline are very complex. Here's a breakdown what's been happening there and why🧵:
Grateful for how smooth open-source ML is now — 1-line installs, 2-min setup. Pre 2019 this could sometimes take literally two days