Nathan Lichtlé
@nathanlichtle
PhD @UCBerkeley (@berkeley_ai)
We ran 5,600 hyperparameter sweeps to compare RL algorithms on hidden-information games with billions of states. In our benchmark, we found that properly tuned policy gradient methods, such as PPO, performed the best. Paper: arxiv.org/abs/2502.08938
Model-free deep RL algorithms like NFSP, PSRO, ESCHER, & R-NaD are tailor-made for games with hidden information (e.g. poker). We performed the largest-ever comparison of these algorithms. We find that they do not outperform generic policy gradient methods, such as PPO. 1/N
New Blog Post: Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment bair.berkeley.edu/blog/2025/03/2…
This paper has everything: large-scale empirical testing, new benchmarks, and subsequent empirical recommendations that we think might make solving imperfect information games a little easier
We ran 5,600 hyperparameter sweeps to compare RL algorithms on hidden-information games with billions of states. In our benchmark, we found that properly tuned policy gradient methods, such as PPO, performed the best. Paper: arxiv.org/abs/2502.08938
Tic-Tac-Toe... but the opponent's moves are hidden. Can you outsmart our top RL agents? Play here: nathanlichtle.com/research/2p0s
We ran 5,600 hyperparameter sweeps to compare RL algorithms on hidden-information games with billions of states. In our benchmark, we found that properly tuned policy gradient methods, such as PPO, performed the best. Paper: arxiv.org/abs/2502.08938
Can language models be trained to find solutions to as yet unsolved mathematical problems? The answer is yes! Check our new article 🙂 1/n
Transformers can be trained to solve a 132-years old open problem: discovering global Lyapunov functions. New paper on Arxiv (accepted in NeurIPS 2024), with @albe_alfa and @Amaury_Hayat arxiv.org/abs/2410.08304 1/8
We’re open-sourcing and arxiving GPUDrive, a GPU-accelerated 2.5D multi-agent driving simulator that runs at over a million FPS. Hundreds of scenes on one GPU means scalable multi-agent planning
Excited to share our work about the MVT, a large-scale highway field test for which we trained autonomous vehicles to smooth out traffic flow using deep reinforcement learning, then deployed our AI controllers onto 100 vehicles in busy morning traffic. Read more below! (1/n)