Qiyue Gao
@QiyueGao123
PhD student @UCSanDiego; Prev intern @allen_ai #AI #ML #NLP
🗒️Can we meta-learn test-time learning to solve long-context reasoning? Our latest work, PERK, learns to encode long contexts through gradient updates to a memory scratchpad at test time, achieving long-context reasoning robust to complexity and length extrapolation while…
I have been long arguing that a world model is NOT about generating videos, but IS about simulating all possibilities of the world to serve as a sandbox for general-purpose reasoning via thought-experiments. This paper proposes an architecture toward that arxiv.org/abs/2507.05169
Some critical reviews and clarifications on different perspectives of world models. 🔥🌶️ Stay tuned for more on PAN — its position on the roadmap towards next-level intelligence, strong results, and open-sources❗️🧠
I have been long arguing that a world model is NOT about generating videos, but IS about simulating all possibilities of the world to serve as a sandbox for general-purpose reasoning via thought-experiments. This paper proposes an architecture toward that arxiv.org/abs/2507.05169
💥💥BANG! Experience the future of gaming with our real-time world model for video games!🕹️🕹️ Not just PLAY—but CREATE! Introducing Mirage, the world’s first AI-native UGC game engine. Now featuring real-time playable demos of two games: 🏙️ GTA-style urban chaos 🏎️ Forza…
Thank you for sharing our work! 🚀 Vision-Language Models are advancing rapidly, and it's exciting to track their progress. We'll continuously update our leaderboard and datasets as new VLMs emerge. Stay tuned for more insights and results! Our website: wm-abench.maitrix.org…
Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation "we introduce WM-ABench, a large-scale benchmark comprising 23 fine-grained evaluation dimensions across 6 diverse simulated environments with controlled counterfactual simulations. Through 660…
Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation "we introduce WM-ABench, a large-scale benchmark comprising 23 fine-grained evaluation dimensions across 6 diverse simulated environments with controlled counterfactual simulations. Through 660…