Jake Grigsby
@__jakegrigsby__
UT Austin ML/RL PhD student
🚀 Launch day! The NeurIPS 2025 PokéAgent Challenge is live. Two tracks: ① Showdown Battling – imperfect-info, turn-based strategy ② Pokemon Emerald Speedrunning – long horizon RPG planning 5 M labeled replays • starter kit • baselines. Bring your LLM, RL, or hybrid…
We took a short break from robotics to build a human-level agent to play Competitive Pokémon. Partially observed. Stochastic. Long-horizon. Now mastered with Offline RL + Transformers. Our agent, trained on 475k+ human battles, hits the top 10% on Pokémon Showdown leaderboards.…
There’s an RL trick where we turn Q-learning into classification. Among other things, it’s a quick fix for multi-task RL’s most unnecessary problem: that the scale of each task’s training loss evolves unevenly over time. It’d be strange to let that happen in supervised learning,…

One of RL's most future-proof ideas is that adaptation is just a memory problem in disguise. Simple in theory, scaling is hard! Our #ICLR2024 spotlight work AMAGO shows the path to training long-context Transformer models with pure RL. Open-source here: github.com/UT-Austin-RPL/…
Measuring Visual Generalization in Continuous Control from Pixels deepai.org/publication/me… by Jake Grigsby et al. #SupervisedLearning #ReinforcementLearning
An updated version of the #TextAttack paper has posted, including: • Implementations of 16 papers on NLP attacks • Dozens of attack components (WordNet synonym substitution, BERT sentence encoding, etc.) • 82 pre-trained models Check it out on ArXiv: arxiv.org/abs/2005.05909