Si-Qi LIU 刘思奇
@liusiqi42
Staff Research Engineer @DeepMind. AI/ML PhD candidate @UCL. Multiagent RL, Game Theory, Evaluation.
Yet scale it we must!
AI PROMPTING → AI VERIFYING AI prompting scales, because prompting is just typing. But AI verifying doesn’t scale, because verifying AI output involves much more than just typing. Sometimes you can verify by eye, which is why AI is great for frontend, images, and video. But…
Agreed. Unless you have a horse in the race, "where should AGI be built" is just irrelevant. The real question is how can we (continue to) incentivise open models at frontier performance - gatekeeping takes away competition and makes open models harder to justify for incumbents.
Finally took time to go over Dario's essay on DeepSeek and export control and to be honest it was quite painful to read. And I say this as a great admirer of Anthropic and big user of Claude* The first half of the essay reads like a lengthy attempt to justify that closed-source…
Sweeping generalisation is always the easy way out. It takes courage and independence to resist. Parroting what's reported, of one bad apple, and generalising it to all Chinese on stage at @NeurIPSConf is extremely disappointing. I hope this is an exception.
Mitigating racial bias from LLMs is a lot easier than removing it from humans! Can’t believe this happened at the best AI conference @NeurIPSConf We have ethical reviews for authors, but missed it for invited speakers? 😡
Very true! When I was an AdWords intern in 2015, I cold e-mailed @sirbayes hoping to work on im2calories just because it was such a cool idea! That specific project didn't continue but still I was given a chance to learn how to do research on his team. What a ride it has been!
Cold emails are hard and good ones can change a life. Here is my email to @NandoDF that started my career in ML (at the time I was a PM at Google) docs.google.com/document/d/1_u… Real effort (incl feedback) went into drafting it. Thanks to @EugeneVinitsky for nudging me to put it online
Haha many late nights launching those policy gradients runs only to wake up seeing those infamous wobbly curves! Little did I know the recipe would go on to scale so well!
My colleague and former intern @liusiqi42 reminded me that we did RLFT for LMs almost 10 years ago - back then it was for an img2text model based on CNNs and RNNs. But same basic recipe - pre train with MLE then fine tune with PG. arxiv.org/abs/1612.00370
Majestic!
🚨Breaking: New Gemini-2.5-Pro (06-05) takes the #1 spot across all Arenas again! 🥇 #1 in Text, Vision, WebDev 🥇 #1 in Hard, Coding, Math, Creative, Multi-turn, Instruction Following, and Long Queries categories Huge congrats @GoogleDeepMind!
Think you know Gemini? 🤔 Think again. Meet Gemini 2.5: our most intelligent model 💡 The first release is Pro Experimental, which is state-of-the-art across many benchmarks - meaning it can handle complex problems and give more accurate responses. Try it now →…
👀
Breaking News from Chatbot Arena⚡ @GoogleDeepMind Gemini-2.0-Flash debuts at #3 Overall - a massive leap from Flash-002! Highlights (improvement from Flash-002): - Overall: #11 → #3 - Hard Prompts: #15 → #2 - Coding: #22 → #3 - Longer query: #8 → #1 - Overall…
We’re presenting the first AI to solve International Mathematical Olympiad problems at a silver medalist level.🥈 It combines AlphaProof, a new breakthrough model for formal reasoning, and AlphaGeometry 2, an improved version of our previous system. 🧵 dpmd.ai/imo-silver
Oh boy we need that colab notebook real bad!
Why has productivity (GDP per hour worked) grown faster in the US than in Europe over the last 15 years. [note: if you think this is because Americans work more than Europeans, you are wrong. Productivity measures output *per hour worked*]
Gemini and I also got a chance to watch the @OpenAI live announcement of gpt4o, using Project Astra! Congrats to the OpenAI team, super impressive work!
It's such an honor to work on Project Astra with such an amazing team from across Gemini and Google DeepMind! While the #GoogleIO keynote was happening we had a last minute idea of watching the keynote with Project Astra. Check it out!
This Friday, we will have the honour of hosting @liusiqi42 from @GoogleDeepMind and @ucl presenting “NfgTransformer: Equivariant Representation Learning of Normal-form Games”. See y’all there!
👀
Crazy results🔥 On the other hand, all the mocap cameras and markers keep reminding me that it’s not yet possible to get this to work onboard with egocentric vision & sensors 😢 Long way to go💪
Have a view, make it known and stand by them in public. Kudo to the team for setting an example for the industry!
Here is Claude 3's system prompt! Let me break it down 🧵