Longtao Zheng
@ltzheng01
PhD student @NTUsg. Building open-ended agents in open-ended worlds.
🛠️🤖 Introducing SimpleTIR: An end-to-end solution for stable multi-turn tool use RL 📈 Multi-turn RL training suffers from catastrophic instability, but we find a simple fix ✨ The secret? Strategic trajectory filtering keeps training rock-solid! 🎯 Stable gains straight from…


The first film from our partnership with @primordialsoup_ - a storytelling venture founded by visionary director Darren Aronofsky - is debuting at @Tribeca. Directed by Eliza McNitt, ANCESTRA uses traditional filmmaking alongside Veo, our generative video model. Take a look ↓…
This i did not expect. Cool.
Perhaps the most important thing you can read about AI this year : “Welcome to the Era of Experience” This excellent paper from two senior DeepMind researchers argues that AI is entering a new phase—the "Era of Experience"—which follows the prior phases of simulation-based…
Good researchers can smell the BS without even reading the papers :) x.com/agihippo/statu…
Lol the RL papers in the wilderness are wonky ngl
Rich's slogans for AI research (revised 2006): 1. Approximate the solution, not the problem (no special cases) 2. Drive from the problem 3. Take the agent’s point of view 4. Don’t ask the agent to achieve what it can’t measure 5. Don't ask the agent to know what it can't verify…
Rich Sutton, the godfather of reinforcement learning, gave me some golden advice today: work hard, think hard, and play—and don’t have too much respect. I can’t agree more—it’s advice I’ll take to heart.
lol this song is funny🤣
Gave a quick test to MEMO (actually ... not that quick -20 minut for a 30s video, on a A100😳). bonus #Udio song about the project. portrait of yours truly from @artflow_ai Links, including a colab in next message, if anyone is interested.
🔥 MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation 😎 Open source is the way! 🔊
Introducing 🧞Genie 2 🧞 - our most capable large-scale foundation world model, which can generate a diverse array of consistent worlds, playable for up to a minute. We believe Genie 2 could unlock the next wave of capabilities for embodied agents 🧠.
The secret to doing good research is always to be a little underemployed. You waste years by not being able to waste hours. - Amos Tversky
I just tried out playing Counter-Strike in a neural network on my MacBook. In my first run, it diverged into mush pretty quickly. The recording is sped up 5x.
Ever wanted to play Counter-Strike in a neural network? These videos show people playing (with keyboard & mouse) in 💎 DIAMOND's diffusion world model, trained to simulate the game Counter-Strike: Global Offensive. 💻 Download and play it yourself → github.com/eloialonso/dia… 🧵
I've made this point before: video generation systems are not good world models (at least, not necessarily). They could be mode-collapsed, and you wouldn't know.
Coming back to video generation, IMO we have to be careful when we instinctively see it as a world model. The latter tolerates far less any form of mod collapse. 1/4
Our new paper on using YouTube videos to learn language conditioned navigation is out! By leveraging pretrained models and video data mined from the web, we can get robots to better understand language instructions.
Excited to share our recent research, LeLaN for learning language-condtitioned navigation policy from in-the-wild video in UC Berkeley and Toyota Motor North America. We present the LeLaN on CoRL 2024. @CatGlossop @ajaysridhar0 @shahdhruv_ @oier_mees and @svlevine
Really promising results we got recently: Generative CoT Verifiers trained on only grade-school math problems in GSM8K generalize quite well to much harder *high-school competition* problems in MATH!
Every wondered if we can model motion as a language? can we tokenize this new language? is it useful? Turns out tremendously! 🚀 In out latest #NeurIPS2024 paper on QueST: Self-Supervised Skill Abstractions for Learning Continuous Control, we find that action tokenization…