Nando de Freitas
@NandoDF
VP http://microsoft.ai understanding & harnessing intelligence responsibly. Past: NPI, AlphaGo tuning, Gato, ReST, AlphaCode, Lyria, Imagen 3, Veo, r-Gemma, Genie ...
Enjoy your summer vacation (or winter if South) - great for the soul, family, restoration, creativity and reflection. Hence, great for responsible AI 🩵

UK academics: escape GPU poverty! SovAI wants to help… More below👇
Many top academic AI researchers are GPU poor. The Sovereign AI Unit wants to change this story. We are seeking ambitious AI research proposals for UK academics aiming to take their research to the next level. Read below to find out more and apply 🧵👇
One of the first technical reports in a while that has considerable detail. In particular -- preference training tends to be much more painful for multimodal, largely because of lack of calibrated rewards. Some nice details here about overcoming some of those challenges.
MAI-DxO in action, tackling one of those complex cases:
Meet MAI-DxO, our AI Diagnostic Orchestrator. By emulating a virtual panel of specialists, it boosts base-model accuracy by 10% and solves 85.5% of NEJM cases—versus 20% for practicing physicians (3/6) arxiv.org/abs/2506.22405
New paper today! 🥳 How good is generative AI at diagnosis compared to human doctors? We introduce a novel, interactive medical benchmark (SDBench) for “sequential diagnosis”, and an orchestrator (MAI-DxO) that achieved over 4x higher diagnostic accuracy vs experienced physicians…
We're taking a big step towards medical superintelligence. AI models have aced multiple choice medical exams – but real patients don’t come with ABC answer options. Now MAI-DxO can solve some of the world’s toughest open-ended cases with higher accuracy and lower costs.
It is remarkable that a committee of deliberating AI models can diagnose patients, propose the next tests, use the results of the tests to improve the diagnosis, order more tests, and eventually arrive at diagnoses, outperforming doctors. It is a specific setting, but nonetheless…
Microsoft AI built MAI-DxO to simulate a virtual panel of physicians with different approaches collaborating to find a diagnosis on each case. They also included the ability to set a budget to avoid infinite testing (higher costs, longer wait times, etc.).
Huge milestone from the team! A blazing-fast diffusion LLM built for chat, delivering real-time performance at commercial scale. If you liked Mercury Coder for code, you'll love this for conversation.
We’re excited to launch Mercury, the first commercial-scale diffusion LLM tailored for chat applications! Ultra-fast and efficient, Mercury brings real-time responsiveness to conversations, just like Mercury Coder did for code.
💼 [Career Update] I feel fortunate to have spent the past 4 years at Meta, working with some of the brightest minds to build Meta’s multimodal foundation models — MovieGen, Emu, and more. It shaped me not only as a researcher but as a person. Now, it’s time for a new chapter. I…
This is wild - researchers built a neural network that can control real-world entities like cars, bikes & pedestrians using game controls, without any real-world training labels. It learns from racing games but works on actual footage 🧵
It's great to have this new open-source benchmark for tabular data 🚀 It's really comprehensive, created and maintained by serious open source contributors from various groups, and I expect it to quickly become the new standard benchmark. I'm super excited that progress in the…
🚨What is SOTA on tabular data, really? We are excited to announce 𝗧𝗮𝗯𝗔𝗿𝗲𝗻𝗮, a living benchmark for machine learning on IID tabular data with: 📊 an online leaderboard (submit!) 📑 carefully curated datasets 📈 strong tree-based, deep learning, and foundation models 🧵
jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval - 3.8B embedding model supporting both single- and multi-vector embs in the late interaction style - SotA perf on single- and cross-modal retrieval, particularly strong in tables, charts, etc
Monday Starkly Speaking: we understand a core diffusion model ingredient better - classifier free guidance. Via "Classifier-Free Guidance: From High-Dimensional Analysis to Generalized Guidance Forms" arxiv.org/abs/2502.07849 On Zoom 12pm ET / 6pm CEST: portal.valencelabs.com/starklyspeaking
New paper on the generalization of Flow Matching arxiv.org/abs/2506.03719 🤯 Why does flow matching generalize? Did you know that the flow matching target you're trying to learn **can only generate training points**? with @Qu3ntinB, Anne Gagneux & Rémi Emonet 👇👇👇
Diffusion models create novel images, but they can also memorize samples from the training set. How do they blend stored features to synthesize novel patterns? Our new work shows that diffusion models behave like Dense Associative Memory: in the low training data regime (number…
Wrapped up Stanford CS336 (Language Models from Scratch), taught with an amazing team @tatsu_hashimoto @marcelroed @neilbband @rckpudi. Researchers are becoming detached from the technical details of how LMs work. In CS336, we try to fix that by having students build everything: