Allan Zhou
@AllanZhou17
AI & robotics research @GoogleDeepMind | Prev: PhD @Stanford
How should we order training examples? In a new blogpost (w/ @yidingjiang), we explore a compression-based perspective: order your dataset to minimize its prequential codelength.
Data selection and curriculum learning can be formally viewed as a compression protocol via prequential coding. New blog (with @AllanZhou17 ) about this neat idea that motivated ADO but didn’t make it into the paper. yidingjiang.github.io/blog/post/curr…
re-upping this. I think it contains many important truths and mysteries.
I'm really fascinated by this dataset from the AI poetry survey paper. Here's another visualization I just made. Survey respondents were shown one of these 10 poems, and either told that they were authored by AI, human, or not told anything.
This on it's own is a super useful capability for roboticists. My top request is and always has been just for a grasping model that *works*
Gemini Robotics zero shot picks a dextrous hand: No prior demos, not even videos. It recognized, failed to grasp (slippery surface), retried with new angles, got help, nailed the pick, adjusted post-pick. Mad respect to DeepMind team. Now I really worry about human labor 😅
A mental model I find useful: all data acquisition (web scrapes, synthetic data, RL rollouts, etc.) is really an exploration problem 🔍. This perspective has some interesting implications for where AI is heading. Wrote down some thoughts: yidingjiang.github.io/blog/post/expl…
Given the progress in robot mobility & locomotion lately, we are throwing a robot fashion show at CoRL 2025! 😂 Deadline to apply is July 15th. This is going to be weird and lengendary - at the pareto frontier of art and tech, please consider submitting! corl.org/contributions/…
Say ahoy to 𝚂𝙰𝙸𝙻𝙾𝚁⛵: a new paradigm of *learning to search* from demonstrations, enabling test-time reasoning about how to recover from mistakes w/o any additional human feedback! 𝚂𝙰𝙸𝙻𝙾𝚁 ⛵ out-performs Diffusion Policies trained via behavioral cloning on 5-10x data!
For the last 1.5 days, we've been demoing Gemini Robotics *live* to guests at Google I/O! The robot can converse with you while solving a large range of dexterous tasks 🤖 AI + robotics is so exciting, and we're cooking up a storm @GoogleDeepMind. Can't wait to share more 🚀
If you haven't already seen our Gemini Robotics demo at #GoogleIO, today's your chance! You'll give commands to our robots and watch them do the work, using only your voice. If you're not at IO, check out: youtu.be/BKM3vohmED8?fe…
NO WAY. It did it. And, was that, actually funny? Prompt: > a man doing stand up comedy in a small venue tells a joke (include the joke in the dialogue)
> A man is running through a beautiful summer park at dawn, he is out of breath, he slows and stops, looks at the camera and says, while panting, "Run AI with an API. Use Replicate", then he carries on running. Then "Replicate" text fades into view at the end Seems like the…
Back in March, I wore a head-mounted camera for a week straight and fine-tuned ChatGPT on the resulting data. Here's what happened (1/6) arxiv.org/pdf/2504.03857
I will be at #ICLR2025 until Monday. Looking forward to meeting old and new friends. If you want to chat about generalization / RL / curriculum learning / compression & algorithmic info theory (or anything really 😬), please DM me! Otherwise, I will be presenting 2 papers:
Check out our online data selection alg ADO at ICLR 2025! And take a look at this blog post by @yidingjiang and @AllanZhou17 summarizing the key ideas: bland.website/notes/ado/
Selecting good pretraining data is crucial, but rarely economical. Introducing ADO, an online solution to data selection with minimal overhead. 🧵 1/n
📊 Are you training LLMs and manage your training data via a DFS? Do you spend a lot of time writing data wrangling/mixing scripts? ⌛ We just posted a preprint on Mixtera, our data plane for LLM/VLM training🎉 🔗 github.com/eth-easl/mixte… 🔗 arxiv.org/abs/2502.19790 Read more👇
🌎🌏🌍 We are organizing a workshop on Building Physically Plausible World Models at @icmlconf 2025! We have a great lineup of speakers, and are inviting you to submit your papers with a May 10 deadline. Website: physical-world-modeling.github.io
Last month we announced Gemini Robotics (GR) and Gemini Robotics-ER (GR-ER). GR-ER is a powerful VLM specialised for spatial understanding, including detecting object poses in 2D/3D, pointing, and even *predicting grasp poses*. Take a look at this demo. Details below. 🧵
I've just paid off on this bet to @ATabarrok, who won.
I bet @ATabarrok $100 that “Systems in GPT line will by 2025 make <$1B in customer revenue clearly tied to such systems. If product contains such as component, but also has other features, one needs to attribute best estimate % of product revenue to this one.” Right Alex?
o3-mini-high helps accelerate scientific discovery 💙 arxiv.org/abs/2503.23758